chaim@taux01.UUCP (Chaim Bendelac) (08/18/88)
Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have expressed the wish for portability standards. Many organizations are spending tremendous resources to promote such standards. Nothing new there. I wondered, if there is no room for another standard layer, specially designed for software DISTRIBUTION. Imagine an Intermediate Program Representation Standard (IPRS) along the lines of an intermediate language within a compiler. Language independent, architecture independent. The distributor compiles and optimizes his program with his favorite language compiler into its IPR, copies the IPR onto a tape and sells. The buyer uses a variation on 'tar' to unload the tape and to post-process the IPR program with the system-supplied architecture-optimized IPRS-to-binary compiler backend. No need for cumbersome source-distributions, no more different binary copies of the software. Utopia! You introduce a weirdly new, non-compatible architecture? Just supply a Standard Unix (ala X/Open or OSF or whatever), an IPRS-to-binary backend, and you are in business. The Software Crisis is over! :-) ? :-( No free lunch, of course. The programmer still has to write "portable" software, which is a difficult problem. A truly language- and architecture- independent interface is almost as difficult to design as the old "universal assembler" idea. But with enough incentives, perhaps? Questions: 1. How desperate is the need for such a standard? (I know: GNU does not need ISPRs nor ABIs...) 2. Assuming LOTS of need, how practical might this be? 3. What are the main obstacles? Economical? Political? Technical? 4. What are the other advantages or disadvantages?
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (08/18/88)
In article <891@taux01.UUCP> chaim@taux01.UUCP (Chaim Bendelac) writes: | Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have | expressed the wish for portability standards. Many organizations are spending | tremendous resources to promote such standards. Nothing new there. | | I wondered, if there is no room for another standard layer, specially | designed for software DISTRIBUTION. Imagine an Intermediate Program | Representation Standard (IPRS) along the lines of an intermediate language | within a compiler. Language independent, architecture independent. | The distributor compiles and optimizes his program with his favorite | language compiler into its IPR, copies the IPR onto a tape and sells. | The buyer uses a variation on 'tar' to unload the tape and to post-process | the IPR program with the system-supplied architecture-optimized IPRS-to-binary | compiler backend. This has been done before. The "UCSD Pascal" system was done this way, and Fortran (and I think Ada) compilers were created to generate the P-code (pseudo code). The original version of B I saw worked this way, and you could either interpret or compile to binary. The compile cycle was (a) 2 pass compile to P-code, (b) global machine independent optimize of the P-code, (c) 2 pass compile to assembler, (d) peephole optimize the assembler source, and (e) two pass assembler. This was slow, but it produced some very good code, and the interpreter actually ran fairly well after (b). The interpreter translated the text tokens into two byte strings before execution, so it was useful if not blindingly fast. As we got it, B didn't have the optimizers, but I added them when I was creating a derivetive language, IMP, which used the same P-codes. A version of this for CP/M-*) was floating around the BBS's called IL/1. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
henry@utzoo.uucp (Henry Spencer) (08/20/88)
In article <891@taux01.UUCP> chaim@taux01.UUCP (Chaim Bendelac) writes: >... Imagine an Intermediate Program >Representation Standard (IPRS) along the lines of an intermediate language >within a compiler. Language independent, architecture independent. >The distributor compiles and optimizes his program with his favorite >language compiler into its IPR, copies the IPR onto a tape and sells. >The buyer uses a variation on 'tar' to unload the tape and to post-process >the IPR program with the system-supplied architecture-optimized IPRS-to-binary >compiler backend. The one problem I can think of is that it's tricky to build such a representation in which the front end doesn't need to know *anything* about the machine. Things like data-type sizes often have to be decided before the intermediate representation is generated, even if the details of the code generation get deferred. Perhaps not impossible, but tricky. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
cik@l.cc.purdue.edu (Herman Rubin) (08/20/88)
In article <891@taux01.UUCP>, chaim@taux01.UUCP (Chaim Bendelac) writes: > Previous articles (under "Standard Un*x HW"/"Open Systems"/"ABI", etc) have > expressed the wish for portability standards. Many organizations are spending > tremendous resources to promote such standards. Nothing new there. > > I wondered, if there is no room for another standard layer, specially > designed for software DISTRIBUTION. Imagine an Intermediate Program > Representation Standard (IPRS) along the lines of an intermediate language > within a compiler. Language independent, architecture independent. > The distributor compiles and optimizes his program with his favorite > language compiler into its IPR, copies the IPR onto a tape and sells. > The buyer uses a variation on 'tar' to unload the tape and to post-process > the IPR program with the system-supplied architecture-optimized IPRS-to-binary > compiler backend. This would be very useful, _but_ consider the following problems. I could probably list over 1000 hardware-type operations (I am not including such things as the elementary transcendental functions; only those things for which I can come up with a nanocode-type bit-handling algorithm, such as multiplication and division) which I would find useful. My decision as to which algorithm to use for a particular problem would be highly dependent on the timing of these operations. To give a simple example, one would be well- advised to avoid too many divisions on a CRAY. Integer division on a CRAY or a CYBER 205 is more expensive than floating point, and on the CRAY it is even necessary to work to ensure the correct quotient. Packing or unpacking a floating point number is trivial on some machines but much more difficult on others. Thus one cannot optimize a program without knowing the explicit properties of operations on the target machine. We used to have a CDC6500 and a 6600 at Purdue. These machines had exactly the same instruction set, and unless there was a fancy speedup attempt using parallel IO and computing in an unsafe manner, exactly the same results would occur. However, optimization was totally different. I suggest instead that we have a highly flexible intermediate language, with relatively easy but flexible syntax, and a versatile macro processor. This would be enough by itself in many situations, but I know of none. An example of a macro is x = y - z which I would like to treat as the (= -) macro. Then we could have various algorithms which an optimizing macro assembler could assemble and estimate the timing. Another advantage of something like this, and this is particularly relevant to this group, is that it can be pointed out the multitudinous situations where simple hardware instructions not now available can greatly speed up operations. I personally consider the present "CISC" machines as RISCy. > > No need for cumbersome source-distributions, no more different binary copies > of the software. Utopia! You introduce a weirdly new, non-compatible > architecture? Just supply a Standard Unix (ala X/Open or OSF or whatever), > an IPRS-to-binary backend, and you are in business. The Software Crisis > is over! > See above. I think it will be simpler, but not what was proposed. > :-) ? :-( > > No free lunch, of course. The programmer still has to write "portable" > software, which is a difficult problem. A truly language- and architecture- > independent interface is almost as difficult to design as the old "universal > assembler" idea. But with enough incentives, perhaps? Questions: > The "universal assembler" is more practical, if it written more like CAL with overloaded operators. The interface would require a macro processor, but could be done with little more. However, as I have pointed out, the portable software mentioned above cannot exist. The examples above are not for truly parallel machines. On a parallel machine, how would one break a vector into the positive and negative elements, use a separate algorithm to compute a function for these cases, and put the results back together in the order of the original arguments? Something can be done, but I suggest one would be better served by kludging in additional hardware. Now if one is stuck with this situation, and does not have the additional hardware, an algorithm somewhat slower may be in order. We must face the fact that there cannot be efficient portable software. We may be able to produce reasonably efficient semi-portable software, and we should try for that. I believe that the tools for that can be developed. We also should try to improve the hardware to be able to use the "crazy" instructions implementable in nanocode or hardwired. There are useful instructions, not present in the HLLs, which may be so slow as to be impractical if not hardware. > 1. How desperate is the need for such a standard? (I know: GNU > does not need ISPRs nor ABIs...) > 2. Assuming LOTS of need, how practical might this be? > 3. What are the main obstacles? Economical? Political? Technical? > 4. What are the other advantages or disadvantages? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
mrspock@hubcap.UUCP (Steve Benz) (08/20/88)
>[Chaim Bendalac wrote:] >>[Use intermediate language instead of a common binary code for >> distribution purposes, and make a utility common to all machines >> for translating this intermediate code to executable form.] To which Henry Spencer replies: > The one problem I can think of is that it's tricky to build such a > representation in which the front end doesn't need to know *anything* about > the machine. Things like data-type sizes often have to be decided before > the intermediate representation is generated, even if the details of the > code generation get deferred. Perhaps not impossible, but tricky. I'm not sure what Henry talks about is really that big a problem, If the *source* is portable, then the intermediate code could be made to be portable. The worst case would be if one had to abstract a data type all the way back to the level of the source language. That sort of worst-case scenario would result in a compilational problem, but not a representational problem. (i.e. the onus would be on the vendor to fix the problem, not on the standards committee.) I think the real bugaboo here will be system calls and the like. Granted that they are theoretically semantically identical, but in reality, they're not so. In order for such a scheme to work, vendors would still have to break down and come to an agreement on what the standard semantics of all system calls will be. And don't talk to me about graphical interfaces... - Steve Benz
bzs@encore.UUCP (Barry Shein) (08/21/88)
Re: intermediate p-code for distribution... Look into the Portable Standard Lisp effort at University of Utah. This was one of their areas of effort, a LAP (Lisp Assembly Program, psuedo-asm generated by the compiler) which would be highly portable, allowing porting of the compiler and bootstrapping of the entire system. -Barry Shein, ||Encore||
henry@utzoo.uucp (Henry Spencer) (08/24/88)
In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes: >> Things like data-type sizes often have to be decided before >> the intermediate representation is generated, even if the details of the >> code generation get deferred... > >I'm not sure what Henry talks about is really that big a problem, Is the layout of structs in memory decided before or after the intermediate representation is generated? What about the results of "sizeof"? How is "varargs" handled? And so forth. If you try to build a completely machine- independent "intermediate" form, I think you will end up with something that looks very much like a tokenized version of the source. This might or might not be satisfactory for the original purposes, but an intermediate represen- tation (in the usual sense of the word) it's not. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mitch@Stride.COM (Thomas Mitchell) (08/25/88)
In article <3503@encore.UUCP> bzs@encore.UUCP (Barry Shein) writes: > >Re: intermediate p-code for distribution... ^^^^^^ Caution: p-Code, p-System have some prior use and perhaps trademark associated with them. Now called the "Power System" by Pecan of Brooklyn it is an interpreted OS. It had its origins at UCSD and is the origin of UCSD Pascal (aka Apple Pascal). Their p-Code (pseudo code) is portable from one machine to another. Their CODE file format has a field which lets the interpreter determine the byte sex of the p-code. -- Thomas P. Mitchell (mitch@stride1.Stride.COM) Phone: (702)322-6868 TWX: 910-395-6073 FAX: (702)322-7975 MicroSage Computer Systems Inc. Opinions expressed are probably mine.
dick@ccb.ucsf.edu (Dick Karpinski) (08/26/88)
In article <1988Aug23.180420.28483@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes: >>> Things like data-type sizes often have to be decided before >>> the intermediate representation is generated.... >... layout of structs in memory ... the results of "sizeof" ... "varargs" >... >looks very much like a tokenized version of the source. Indeed, it can be argued that the Gnu C Compiler's Register Transfer Language (gcc's RTL) does look like a tokenized version of the source. I don't mind that a bit. But would it work?? Would vendors fear to have their products canabalized and reused in pieces? Probably not. Do I really understand correctly that this Stallman product accomplishes the hitherto unrealistic UNiversal Computer Oriented Language (UNCOL)?? Of course, in the presence of array/vector processors and the like, the universal part is a bit diminished, but still, does he have it right for the ordinary 32-bit workstation of today? I'm not at all sure, but it looks awfully good to me. I would forsee a sort of validation suite to test both the gcc backend (with the machine description) and the specific properties of system calls on the target system. Since such a suite would tell it like it is, I presume that users would like it more than hardware vendors and the sales support staff. Dick Dick Karpinski Manager of Minicomputer Services, UCSF Computer Center UUCP: ...!ucbvax!ucsfcgl!cca.ucsf!dick (415) 476-4529 (11-7) BITNET: dick@ucsfcca or dick@ucsfvm Compuserve: 70215,1277 USPS: U-76 UCSF, San Francisco, CA 94143-0704 Telemail: RKarpinski Domain: dick@cca.ucsf.edu Home (415) 658-6803 Ans 658-3797
tainter@ihlpb.ATT.COM (Tainter) (08/26/88)
In article <847@stride.Stride.COM> mitch@stride.stride.com.UUCP (Thomas Mitchell) writes: >Caution: p-Code, p-System have some prior use and perhaps >trademark associated with them. Now called the "Power System" >by Pecan of Brooklyn it is an interpreted OS. >It had its origins at UCSD and is the origin of UCSD Pascal (aka >Apple Pascal). There is a trademarked thing called "UCSD p-System". But p-code is not UCSD's. UCSD just did an extension of Wirth's original p-code system. P-code is how Wirth did his original implementation of Pascal. M-code is how he did his original implementation of Modula-2, as is what the Lilith runs. I don't know about the original Modula. >Their [UCSD's] p-Code (pseudo code) is portable from one machine to >another. Their CODE file format has a field which lets the >interpreter determine the byte sex of the p-code. Yup. And it cripples all programs down to 16 bit integers, 16 bit addresses (although text addresses can be fudged through segments). This is a nasty thing to do to a 680x0. The Macintosh version had a non-portable 32 bit integer extension and Pecan has recently released a 32 bit version of the Power System. I doubt though, that code is portable between the 16 bit and 32 bit versions. What p-code is really good for is shoe horning onto small machines. p-code is very compact, and the segmentation allows some virtual memory albeit at a significant peformance hit. I also rather line units as a for of modularity. >Thomas P. Mitchell (mitch@stride1.Stride.COM) >Phone: (702)322-6868 TWX: 910-395-6073 FAX: (702)322-7975 >MicroSage Computer Systems Inc. First it was Sage. Then it was Stride Micro. Now it's MicroSage ? --j.a.tainter
bpendlet@esunix.UUCP (Bob Pendleton) (08/26/88)
From article <1988Aug23.180420.28483@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > In article <2793@hubcap.UUCP> mrspock@hubcap.UUCP (Steve Benz) writes: >>> Things like data-type sizes often have to be decided before Usually you can get away with specifying the radix of the data and the minimum number of digits required. Some times you need to specify the maximum number of digits as well. For example "short int x;" (a somewhat ambiguous declaration) can be translated into the portable form "x: static allocated signed binary min 16", or "char *name" can be represented as "name:stack allocated machine_pointer ASCII", or more loosely as "name:machine_pointer signed binary min 7 max 9". I think you can get the feel from these examples. The translator would translate declarations into constraints on the valid representations of the declared items. >>> the intermediate representation is generated, even if the details of the >>> code generation get deferred... >> >>I'm not sure what Henry talks about is really that big a problem, > > Is the layout of structs in memory decided before or after the intermediate > representation is generated? What about the results of "sizeof"? How is ^^^^^^ The layout of structs must be done by the machine specific code generator. NOT by the translator. "sizeof" becomes a symbolic expression that can be evaluated by the code generator, but not by the translator. In one system I wrote, all data size computations were done in the linker. Worked out very well. > "varargs" handled? And so forth. ^^^^^^^^^ Now that looks hard, for a minute. The general rule is that hardware dependent problems must be pushed through to the hardware dependent code generator. The machine indepenent code for a varargs call could look something like this: vararg_block code for arg 1 code for arg 2 . . . code for arg n vararg_end N call what_ever How did the translator find out it was a varargs call? By looking at the declaration of the procedure and/or the way it was used. It's important to remember that this intermediate language must be usable by ALL programming languages, not just C. > If you try to build a completely machine- > independent "intermediate" form, I think you will end up with something that > looks very much like a tokenized version of the source. This might or might > not be satisfactory for the original purposes, but an intermediate represen- > tation (in the usual sense of the word) it's not. Off the top of my head I can think of two different intermediate forms that could be used for this. Each include a symbol table, I hope you include a symbol table as part of your usual sense of the phrase "intermediate form." One is a simple reverse polish form of the source program. The operations can be generic like "+", and the operands can be indexes into the symbol table. This form can be converted directly to code or into a more "normal" intermediate form by symbolic execution of the RPN. The intermediate values generated by during symbolic execution can be constant values, registers, all sorts of things. By using faily complex patterns to decide how to "execute" an operator this approach can give you a surprisingly good quick and dirty code generator or a very machine specific intermediate form suitable for machine specific optimizations. Another possible intermediate form is good old quads. A quad specifies 2 operands (well... sometimes just 1), an operation and a destination. The operands and destinations can be other quads or variables. That is, quads are a way of representing a parse tree in a nice flat file. Actually, both of these are simple ways to represent parse trees in flat files. Both forms make it possible to recover the original parse tree. You can do machine independent optimization on both forms (though I think its easier with quads). You can also do machine independent linking of these forms. The problems are just not that big. I used to spend a lot of time thinking about this kind of thing. My senior project, oh these many years gone by, was the design of a language for writing machine independent LISP interpretors. I've looked very carefully at the PSL work at the University of Utah since I was there when a lot of it was being done. > -- > Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology > they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu I just got back from Xhibition, someone from OSF said they are planning to establish a standard for a portable intermediate langauge. Nice to see that the market is finally growing up enough to need something like this. Imagine being able to buy a program, take it home and pop into the drive, wait a few minutes while a machine specific version is created from the machine independent version on the disk and then just use it. The only thing you have to worry about is wheather or not your machine has enough horse power to run the program well. Will it ever happen? I doubt it, but it sure would be nice. Bob P. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.
chase@Ozona.orc.olivetti.com (David Chase) (08/27/88)
In article <1347@ucsfcca.ucsf.edu> dick@ucsfccb.UUCP (Dick Karpinski) writes: >Do I really understand correctly that this Stallman product accomplishes >the hitherto unrealistic UNiversal Computer Oriented Language (UNCOL)?? RTL isn't a UNCOL, no. RTL (as realized in the Gnu C compiler) contains all sorts of hard-coded register assignments (R15 is my SP) and calling conventions (my stack grows thataway). I'm afraid these make it rather non-universal. If you are interested in this sort of thing, you might check out papers by Fraser and Davidson in the compiler construction conferences of 1988, 1986, and 1984. David
aglew@urbsdc.Urbana.Gould.COM (08/27/88)
..> Pseudo-code as an exchange format between different architectures In a recent UNIX World Omri Serlin (I think) mentioned that OSF is considering something with a name like "Architecture Independent Exchange Format" as a challenge to the plethora of ABIs in the AT&T/SUN world.
henry@utzoo.uucp (Henry Spencer) (08/30/88)
In article <958@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >... The general rule is that hardware >dependent problems must be pushed through to the hardware dependent >code generator... The trouble is that, in the real world, the part of the compiler that does not make hardware-dependent decisions is the easy and small part. The idea that parsing, type checking, etc. is a big deal is basically an academic prejudice in favor of things that are easy to formalize. These things aren't trivial, mind you, but they're not the hard part of a production-quality compiler. This is what brought on my comment about the "intermediate" form being little more than tokenized source. What this would amount to, almost, is a sort of encrypted source. That's pretty much what's wanted for portable distributions that don't give away the farm or permit users to meddle. (Of course, this is anathema to the Stallmanites... Don't expect GNU to support such a portable distribution form.) >I just got back from Xhibition, someone from OSF said they are planning >to establish a standard for a portable intermediate langauge. Nice to >see that the market is finally growing up enough to need something >like this. One can read this two ways, however: are they talking about standardizing the form, or the content? Standardizing the form makes it easy to build multiple compiler front ends feeding into the same code generation, but doesn't remove machine-dependencies from the front ends or their output. (The PCC intermediate format is a de-facto standard form, but its contents are machine-specific.) Standardizing the content is what we've been talking about. I can see OSF doing either. >Imagine being able to buy a program, take it home and pop into the >drive, wait a few minutes while a machine specific version is created >from the machine independent version on the disk and then just use it. >The only thing you have to worry about is wheather or not your machine >has enough horse power to run the program well... And about whether the programmer was competent enough to make the code really portable. Don't forget that condition. Since you haven't really got source, you can't (readily) go in and fix it if there's a problem. This new flexibility also opens the door to a whole new range of bugs, since the code can now be run on machines which the author never even compiled it on. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
rpw3@amdcad.AMD.COM (Rob Warnock) (08/30/88)
There are a number of companies out there these days who have made major advances in the art of emulation of one CPU on another, particularly when the emulated CPU is the IBM PC (of some flavor). These products do a variety of things (some products do one or more), including: (1) pre-processing of the to-be-emulated program; (2) straight emulation, but with caching of the "instruction decode" step; (3) detection of basic blocks and optimization and caching of the whole basic block; (4) [other things I don't know about...?]. When running the emulator on, say, a Mac or Sun workstation the emulated speed can exceed the native speed of a PC/AT. The thought occurs that if one designed a virtual "machine" that was specifically easy to emulate -- given these modern techniques -- that this *might* be a suitable form for "portable" object programs (as contrasted with some "universal intermediate form"). At least it bears some thought. (Hmmm... how hard is it for each of the current RISC CPUs to emulate each of the others?) Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
mash@mips.COM (John Mashey) (08/31/88)
In article <22778@amdcad.AMD.COM> rpw3@amdcad.UUCP (Rob Warnock) writes: ..... >(Hmmm... how hard is it for each of the current RISC CPUs to emulate each >of the others?) The R3000 is pretty easy to convert; actually, we used to convert it to VAX code all of the time, and we've though about the conversions to some of the others. The hardest machines to convert FROM are those with condition codes (whether in condition code register, or when computed into another GP register). They've always been a pain for emulation, especially if there's any irregularity, and if the emulating machine doesn't have an almost identical set of conditions. In fact, just today at lunch, this discussion came up, and I proposed the R3000 as the obvious architecture to use as the standard binary form ......but there were 2 Sun folks and only one of me, so it got voted down :-) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
bpendlet@esunix.UUCP (Bob Pendleton) (08/31/88)
From article <1988Aug29.202603.13897@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > In article <958@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >>... The general rule is that hardware >>dependent problems must be pushed through to the hardware dependent >>code generator... > > The trouble is that, in the real world, the part of the compiler that > does not make hardware-dependent decisions is the easy and small part. > The idea that parsing, type checking, etc. is a big deal is basically > an academic prejudice in favor of things that are easy to formalize. Some how I think I've just been called an academic. Perhaps because I mentioned some of my academic background in compiler writing. Oh well, if I didn't generaly agree with your opinion of academic compiler writers I wouldn't feel so offended. To make another academic reference, go way back into the literature and take a look at the Hearn, Griss LISP compiler. You might have to look for references to REDUCE to find the stuff. Their work indicates that ~90% of all optimization is architechture specific. They looked at register machines and stack machines and found that optimization was very dependent on whether you were targeting a stack machine or a register machine, but not on which specific register or stack machine. If you want examples of real world products that formalize code generation for complex machines, take a look at the ADA compiler from JRS, or the C compiler from QTC. Code generation can be formalized, and has been formalized. But, I think a lot of academics haven't noticed. The January 21, 1988 issue of "Electronics" has several articles on QTCs product. I don't know if JRS is publishing anything. > These things aren't trivial, mind you, but they're not the hard part > of a production-quality compiler. This is what brought on my comment > about the "intermediate" form being little more than tokenized source. Ok, I see what you're saying. Have you looked at the Ada Intermediate Language? I can't claim I've taken a close look, but it looks like the sort of thing we can expect to see. But, I still don't agree with you. There is a lot of machine independent optimizations that can be done on the intermediate form. There may also be a number of application dependent optimizations that can be done. > (Of course, this is > anathema to the Stallmanites... Don't expect GNU to support such a > portable distribution form.) Why not? If I have a generic machine running GNUix (I know its not called that), why should I be barred from buying a commercial software package? The Stallmanites may think its wrong to sell software, but why would they try to stop me from buying software if I want to? I just plain don't understand Stallman. From reading the GNU manifesto it looks like he's trying to tell me that hardware engineers and hardware companies have a right to sell what they produce, but software engineers and software companies don't. This seems just plain insane to me. If anyone wants to set me right on this, do it by private mail. Please don't clutter the net with it. > Standardizing the content is what we've been talking > about. I can see OSF doing either. Yes, I don't trust OSF, or AT&T/SUN as far as I can throw a UNIVAC 1108 core memory cabinet (had to do something to tie this into architectures), but I don't think the OSF member companies would be willing to give up their proprietary architecures. > And about whether the programmer was competent enough to make the code > really portable. Don't forget that condition. Since you haven't really > got source, you can't (readily) go in and fix it if there's a problem. > This new flexibility also opens the door to a whole new range of bugs, > since the code can now be run on machines which the author never even > compiled it on. Yep, look at the Ada experience for perspective on this problem. Thats why I described declarations in the intermediate form as constraints. The intermediate form is going to have to be full of constraints. If the program has a bunch of declarations that require 36 bit ones complement integers, you can be sure that it will run very slowly on a 68020. But, you can make it run without error if those contraints are available to the target machine code generator. I'll bet that a standard conforming implementation on a 68020 will be allowed to refuse to translate a program that is a gross missmatch for the target machine. Bob P. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.
bpendlet@esunix.UUCP (Bob Pendleton) (09/01/88)
From article <22778@amdcad.AMD.COM>, by rpw3@amdcad.AMD.COM (Rob Warnock): > The thought occurs that if one designed a virtual "machine" that was > specifically easy to emulate -- given these modern techniques -- that > this *might* be a suitable form for "portable" object programs (as > contrasted with some "universal intermediate form"). At least it bears > some thought. There really is no difference between an easy to emulate virtual machine and an universal intermediate form. Quads map directly to a 3 address machine, reverse polish maps to a stack machine, and trees can also be directly executed. (EVAL in microcode anyone? Yes, I know its been done.) If the intension is to translate it for the target machine anyway, why not use a form with as much information left in as possible? The problems come from incompatable data formats, addressing modes, and high level operations. Most modern computers use 8 bit bytes and support 16 and 32 bit twos complement integers so there isn't much of a problem there. Floating point, packed decimal, and fixed point might cause some problems. Not all modern computers are byte addressable, so virtual machine code that assumes byte adressablitiy isn't going to run well on a word addressed machine. Also, if the virtual machine code lays out data in structures and arrays you can take a serious performance hit if the virtual machine doesn't allign data to match the addressing granularity of your machine. If the virtual machine is specified at too low a level it will be very difficult to take advantage of any special instructions the target processor may have. Even something as simple as a block copy instruction may not be usable if the virtual machine code has loops to do copies. So, if you have an universal intermediate language, you can write an emulator for it and execute it with a minimal amount of preprocessing, or you can convert it native machine code, or you can build a machine that directly executes it. > Rob Warnock > Systems Architecture Consultant -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.
chaim@taux02.UUCP (Chaim Bendelac) (09/02/88)
In article <891@taux01.UUCP>, chaim@taux01.UUCP (I) wondered: > ...if there is no room for another standard layer, specially > designed for software DISTRIBUTION. Imagine an Intermediate Program > Representation ... language independent, architecture independent... I asked: > 3. What are the main obstacles? Economical? Political? Technical? > 4. What are the other advantages or disadvantages? Below is a [strongly edited] summary of the discussion so far. It seems to me that the problem-solvers outvoice the problem-raisers. I somehow have the feeling that we have not covered many of the problems. Let me try to be more specific: Let the goal be a IR for distribution purposes, that covers "traditional" architectures (you know what I mean) WITHOUT giving one a clear advantage over others (a particular object-format is out), that covers "popular" languages (C, Cobol, Modula-2, Pascal, Fortran, perhaps Ada and Lisp), that lives under a REAL unix-standard (AT&T's, or OSF's or who-ever), and that does not attempt to solve "system-code" - ONLY "application programs" (rule-of-thumb: if you cannot write the program in any language but C the program probably disqualifies). Squeezing the last 1% of performance out of system is NOT a goal, but the IR should be optimizable, both before and after distribution. "Tokenizing the source" is fine, if that allows me to write ONE single code generator for all these language translators out there, with a 100%-proof semantic definition of the IR. I want a R E A L separation. I want a "IR code-generator validation suite" to test my code generator, so that applications can be assured their stuff runs on my machine/architecture. What are the constraints? -- Chaim Bendelac (National Semiconductor Corporation) ---------------------------------------------------------------------------- Summary of current status of discussion: > From: henry@utzoo.uucp (Henry Spencer) it's tricky to build such a > representation in which the front end doesn't need to know *anything* about > the machine. Things like data-type sizes = From: henry@utzoo.uucp (Henry Spencer) Is the layout of structs in memory = decided before or after the IR is generated? What about "sizeof"? "varargs"? = you will end up with a tokenized version of the source. > From: bpendlet@esunix.UUCP (Bob Pendleton) you can get away with specifying > the data radix and the minimum number of digits required. "short int x;" > can be translated into "x: static allocated signed binary min 16". The > translator would translate declarations into constraints on the valid > representations of the declared items. > The layout of structs must be done by the machine specific code > generator. "sizeof" becomes a symbolic expression evaluated by the > code generator. In one system I wrote, all data size computations were done > in the linker. Worked out very well. The machine independent code for a > varargs call could look something like this: vararg_block > code for arg 1 > code for arg 2 > : > It's important to remember that this intermediate language must be > usable by ALL programming languages, not just C. > The problems are just not that big. = From: cik@l.cc.purdue.edu (Herman Rubin) I could probably list over 1000 = hardware-type operations which I would find useful. which algorithm to use = would be dependent on the timing of these operations. one cannot optimize = a program without knowing the explicit properties of the target machine. = We must face the fact that there cannot be efficient portable software. > From: mrspock@hubcap.UUCP (Steve Benz) I think the real bugaboo here will > be system calls and the like. Granted that they are theoretically identical, > but in reality, they're not so. = From: aglew@urbsdc.Urbana.Gould.COM In a recent UNIX World Omri Serlin = (I think) mentioned that OSF is considering something with a name like = "Architecture Independent Exchange Format" as a challenge to the plethora = of ABIs in the AT&T/SUN world. > From: henry@utzoo.uucp (Henry Spencer) in the real world, the part of the > compiler that does not make hardware-dependent decisions is the easy and > small part. What this would amount to, almost, is a sort of encrypted source. > And about whether the programmer was competent enough to make the code > really portable. Don't forget that condition. > This new flexibility also opens the door to a whole new range of bugs, > since the code can now be run on machines which the author never even > compiled it on. = From: dick@ccb.ucsf.edu (Dick Karpinski) the Gnu C's Register Transfer = Language (gcc's RTL) does look like a tokenized version of the source. = I would forsee a sort of validation suite to test both the gcc backend (with = the machine description) and system calls on the target system. > From: chase@Ozona.orc.olivetti.com (David Chase) RTL isn't a UNCOL, no. I'm > afraid these make it rather non-universal. = From: rpw3@amdcad.AMD.COM (Rob Warnock) Some companies have made major = advances in the art of emulation of one CPU on another, particularly when = the emulated CPU is the IBM PC. If one designed a virtual "machine" that was = specifically easy to emulate this *might* be a suitable form for "portable" = object programs (as contrasted with some "universal intermediate form"). > From: mash@mips.COM (John Mashey) The R3000 is pretty easy to convert; the > hardest machines to convert FROM are those with condition codes. ----------------------------------------------------------------------------- -- chaim@nsc
rminnich@super.ORG (Ronald G Minnich) (09/03/88)
In article <127@taux02.UUCP> chaim@taux02.UUCP (Chaim Bendelac) writes: >In article <891@taux01.UUCP>, chaim@taux01.UUCP (I) wondered: >> ...if there is no room for another standard layer, specially >> designed for software DISTRIBUTION. Imagine an Intermediate Program >> Representation ... language independent, architecture independent... OK, everybody out there who remembers the IEEE attempt (what, circa 1978-9?) to promulgate a standard assembly language, raise your middle hand. :-) I don't know about this, given that after 31 years i can't even get fortran programs to move easily. Maybe we should see how far the ISO-oids get with their structured data streams?? ron
bcase@cup.portal.com (09/04/88)
Rob Warnock asks the soon-to-be 64 $ question: |(Hmmm... how hard is it for each of the current RISC CPUs to emulate each |of the others?) Ah! Something to which I can speak informedly. I assume that Rob is really asking about binary recompilation, but the problems are similar for straight (not g... oh never mind) emulation. By far, the biggest problems in architectural emulation are: 1) Dealing with byte-sex differences (big vs. little endian, that is), 2) Dealing with alignment restrictions (everything aligned to natural boundaries?), 3) Dealing with indirect branches. Number 2 is seldom a problem when you are talking about emulating a RISC on a RISC since RISC architectures better reflect the realities of memory system design and call for each data type to be aligned on its natural boundary. However, when emulating a machine that allows 32-bit words to be aligned on byte boundaries on a machine that doesn't, say a 68020 on an 88000, a significant performance hit is taken unless exhaustive analysis is done. Even with exhaustive analysis, it might still be necessary to allow for the worst case (hmmm, this pointer is being passed and then dereferenced and I have no information about its origin, sh*t, I'll have to assume that it could be aligned any ol' which way.). Then there are indirect branches. What are the targets? If you can't tell, say by locating and decoding an associated switch branch address table or something, then the only 100% safe, bullet-proof thing to do is to assume that any instruction is a potential target of this branch. With that assumption forced upon your binary recompiler, many inter-instruction optimizations are prohibited. Another way of stating this problem: you can't tell where the basic blocks are. The indirect branch problem is not an architectural one, but one of loss of information: the basic block boundaries that were marked by labels in the intemediate or assembly form of the program are no longer explicitly marked. Note that the C language allows a case within a switch to fall through to its lexical successor!!! Thus, for an intermediate language or virtual machine architecture to be usefully portable, this information must not be compiled or linked away. As tough as this problem is, things like alignment restrictions and single- sized instructions, RISC characterists for the most part, make this problem easier to handle. There are a couple of tricks, not necessary without tradeoffs, that can be played. But I'm not telling.... Other things like lots of registers help, especially if the machine being emulated has a condition code definition different from the host machine (virtually guaranteed; at least the SPARC has a bit that determines when the condition codes are modified, and, when they are, they are modified in the same way by all instructions). Even under the best circumstances, you don't get something for nothing; emulation or recompilation isn't a panacea. If it were, we'd all just design our favorite machine and then buy Zycad simulators to "run" them!
bcase@cup.portal.com (09/04/88)
Re: architectural emulation |The problems come from incompatable data formats, addressing modes, |and high level operations. Most modern computers use 8 bit bytes and |support 16 and 32 bit twos complement integers so there isn't much of |a problem there. Floating point, packed decimal, and fixed point might |cause some problems. Ha ha! I wish the problem were as easy as "most modern computers use 8 bit bytes ... so there isn't much of a problem there." Alignment and byte-sex incompatibilities can make it not worth the trouble. Floating- point can indeed be a problem: try making extendeds run fast on a machine that doesn't support them.... (SPARC does, does anyone else?)
pardo@june.cs.washington.edu (David Keppel) (09/04/88)
Yes, what we need is a high-level machine language that we can translate our programs into and then is sufficiently general that we can compile *that* efficiently into native machine code. How about C, the portable assembler? :-) :-) :-) ;-D on ( Looking for a Potable assembler ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
aglew@urbsdc.Urbana.Gould.COM (09/05/88)
..> A standard intermediate language for software distribution: >There really is no difference between an easy to emulate virtual >machine and an universal intermediate form. Quads map directly to a 3 >address machine, reverse polish maps to a stack machine, and trees can >also be directly executed. (EVAL in microcode anyone? Yes, I know its >been done.) If the intension is to translate it for the target machine >anyway, why not use a form with as much information left in as >possible? The form with as much information left in as possible is the source code. The purpose of an intermediate language for software distribution is to REMOVE information, namely that information that would let the customer reproduce the original program easily.
aglew@urbsdc.Urbana.Gould.COM (09/05/88)
>Yes, what we need is a high-level machine language that we can >translate our programs into and then is sufficiently general that we >can compile *that* efficiently into native machine code. > >How about C, the portable assembler? :-) :-) :-) > > pardo@cs.washington.edu Treating this seriously - C isn't acceptable because the customer can take the C code and modify it relatively easily. Why do we want intermediate code distributions? So that software vendors can sell code from which it is difficult or impossible to reproduce the original HLL code, therefor making it difficult for their customers to "steal" software products. The rest of us, who aren't software vendors, can and should continue to distribute source code.
chase@Ozona.orc.olivetti.com (David Chase) (09/07/88)
In article <28200195@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: > >>Yes, what we need is a high-level machine language that we can >>translate our programs into and then is sufficiently general that we >>can compile *that* efficiently into native machine code. >> >>How about C, the portable assembler? :-) :-) :-) >> >> pardo@cs.washington.edu > >Treating this seriously - C isn't acceptable because the customer >can take the C code and modify it relatively easily. > C loses technically, also. It does not allow a code generator to take the address of a label. It does not allow a code generator to reference the PC. It provides no guarantees about register assignment (not allocation; assignment). These three non-features preclude the use of some delightful tricks in the compilation of a language with exceptions and exception handling (yes, I know about setjmp; it allows a solution to our problem, but it is an inefficient solution). C also loses as an intermediate language when the source language uses nested procedures. Again, you CAN do it, but it isn't very pretty. C compilers also make pessimistic assumptions about aliasing, and since there is no way for the front-end to communicate what it knows to the C compiler. The C "volatile" keyword is also overkill. Efficient compilation of exceptions is helped by more detailed descriptions of volatile change and reference. (For example, "volatile out" meaning that writes cannot be optimized away, but reads can.) This can be achieved, painfully, by use of non-volatile temporaries to simulate caching of values in registers. Even if the front end does get very clever and performs register allocation in C, it cannot know how many registers there are and it cannot know how they are organized (do floats and ints share registers? How about floats and doubles?). See, I've been figuring out how to use C as an intermediate language in the last few months, and it really doesn't measure up. David
pardo@june.cs.washington.edu (David Keppel) (09/07/88)
>>>[ portable intermediate representation ] >pardo@cs.washington.edu writes: >>How about C, the portable assembler? :-) :-) :-) aglew@urbsdc.Urbana.Gould.COM writes: >[ High-level IR (intermediate representation) needed for distribution ] >[ The customer can modify C too easily ] I'll claim that I can pretty easily write a program that takes ordinary C programs and makes them gosh-almighty hard to understand. Here are some things I can do: * Change names so that none of them are meaningful. * Perform function inlining so that content is replicated (e.g., hard to find). * Use random rewrite rules to change the structure of native constructs so that the code has no apparant style. * Intentionally include dead code that is removed by the optimizer. * Perform non-local data motion. * Study the Obfuscated C Code competition very closely :-) As a simple example: void foo(n) int n; { int i; for (i=0; i<n; ++i) howdy( "Hello, world\n" ); } void x(p) int p; { const char *m = "Hello, world\n"; int r; int z; r = p; loop: if (r>=p) return; else { a (m), z=r++; goto loop; } a (z); exit (0); } A fair optimizer should make the same code for both. Now C is probably sub-optimal for some (language) distributions, but you can perform the same "tricks" with nearly any language. BTW, I have a tool that takes unoptimized (.o) output from pcc on a VAX and turns it back into something close to the original code. Having a high-level IR may not gain the vendors anything over an existing "standard" such as C. ;-D on ( Clear as a bell ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
yba@sword.bellcore.com (Mark Levine) (09/07/88)
In article <5642@june.cs.washington.edu> pardo@cs.washingtone.edu (David Keppel) writes: >Yes, what we need is a high-level machine language that we can >translate our programs into and then is sufficiently general that we >can compile *that* efficiently into native machine code. > In days gone by I was quite fond of MARY, a language in use (still, I think) at the University of Trondheim and on various North Sea oil drilling platforms. It was one of several topics of ye olde Machine Oriented Languages Bulletin from IFIPS, and even showed up at IFIPS w.g. 2.4 while I watched. Since those days, I have seen few attempts at high level machine oriented languages. The folks at Tartan seemed to have the best shots at an intermediate representation for these purposes, based on Wulf et. al.'s work at CMU. MARY would require you to have an implementation prelude for each machine on which you compiled to give machine specific definitions of basic constructs (allowing for assembler level optimization, since the language gurantees access to all machine ops). You could also supply preludes which defined the semantics of operations from other architectures and what to do with them on a new machine. The language is fully typed, and has precedence-less left to right evaluation, and has the notion of a current value -- this makes it easy to program, if somewhat unnatural for those trained to think an assignment statement has the variable on the left. There is an awful lot of good stuff on the shelf, but I think it stays there as long as proprietary interests remain larger than portability concerns. Given the recent movement toward UNIX OS standards, window system standards, and ignoring ADA, perhaps it is time for some archaeology; but, how serious is anyone's interest in a really portable machine oriented high level language?
sher@sunybcs.uucp (David Sher) (09/07/88)
Out of curiosity since I've not seen this question asked, if the problem is that source code is too easy to play with, why not distribute encoded source. Then one needs only create compilers with decoders built in for each machine. Since compilers are purchaseable this seems possible (building in a decoder is trivial). This would be far more secure than any intermediate form which can be messed with and redistributed (good luck in proving that code x really = code y in court). I can imagine all sorts of idempotent program transformations on 3 object code say that would render the program unrecognizable. I realize that you can always play this game with the compiled code (decoders in hardware anyone?). -David Sher ARPA: sher@cs.buffalo.edu BITNET: sher@sunybcs UUCP: {rutgers,ames,boulder,decvax}!sunybcs!sher
mash@mips.COM (John Mashey) (09/07/88)
In article <848@sword.bellcore.com> yba@sabre.bellcore.com (Mark Levine) writes: >In article <5642@june.cs.washington.edu> pardo@cs.washingtone.edu (David Keppel) writes: >>Yes, what we need is a high-level machine language that we can >>translate our programs into and then is sufficiently general that we >>can compile *that* efficiently into native machine code. ..... >There is an awful lot of good stuff on the shelf, but I think it stays >there as long as proprietary interests remain larger than portability >concerns. Given the recent movement toward UNIX OS standards, window >system standards, and ignoring ADA, perhaps it is time for some archaeology; >but, how serious is anyone's interest in a really portable machine oriented >high level language? 1) high-level machine language definitely != C (suggested sometime earlier in this sequence.) Not enough semantics. 2) one interesting possibility is Stanford's U-code, which Fred Chow's dissertation used to show a machine-independent optimizer for several machines (Stanford MIPS, 68K, others). 3) MIPSco took this, and extended as necessary as C was beefed up (volatile needs to get thru, etc), and PL/1, COBOL, and Ada were added. A few things in the optimizer got slightly machine-dependent in the process, although I don't think inextricably so. 4) As I understand it [correct me if wrong], the HP Precision compilers started from Stanford U-code also, and I assume was extended, too. It is NOT that unreasonable to have an intermediate code that is language-independent (covering at least some languages), and reasonably target-independent. This is a great boon to vendors who like to support highly-integrated compiler systems, and their customers who like them also. Whether or not it solves the other problems that started this discussion is yet to be known. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
henry@utzoo.uucp (Henry Spencer) (09/08/88)
In article <963@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >... Have you looked at the Ada Intermediate >Language? I can't claim I've taken a close look, but it looks like the >sort of thing we can expect to see. I saw some of the earlier stuff leading into it, but haven't looked at AIL itself. >> (Of course, this is >> anathema to the Stallmanites... Don't expect GNU to support such a >> portable distribution form.) > >Why not? Distribution with sources is Good. Distribution without sources is Evil. It's as simple as that. GNUnix might include something to handle such a form if it were truly universal, but this would be in the spirit of grudging adaptation to an unpleasant reality. Making it easier to send out software without source is precisely what Stallman does *NOT* want. >> This new flexibility also opens the door to a whole new range of bugs, >> since the code can now be run on machines which the author never even >> compiled it on. > >Yep, look at the Ada experience for perspective on this problem. Thats >why I described declarations in the intermediate form as constraints. >The intermediate form is going to have to be full of constraints. If >the program has a bunch of declarations that require 36 bit ones >complement integers, you can be sure that it will run very slowly on a >68020. But, you can make it run without error if those contraints are >available to the target machine code generator... You miss my point. 36-bit ints can be dealt with in the software fairly well. What I'm thinking of is much more subtle things that the compiler can't easily discover and put in the intermediate form, e.g. "this program depends on being able to dereference NULL pointers". Or, for that matter, "the details of the arithmetic in this program assume that integers are at least 36 bits". This is the sort of thing that will not be known to the compiler unless the programmer is explicit about it -- and lack of programmer attention to such details is exactly the problem. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
ok@quintus.uucp (Richard A. O'Keefe) (09/08/88)
In article <1076@cs.Buffalo.EDU> sher@wolf.UUCP (David Sher) writes: >Out of curiosity since I've not seen this question asked, if the problem >is that source code is too easy to play with, why not distribute encoded >source. If you have a product which you want to make available on several systems, and the language you are using is the same on those systems, this is quite an effective method. We have an add-on product for our Prolog system which is distributed as encrypted source. This means that we have to maintain a single kit of files, not a separate kit for each supported machine type. There is, however, one reason why people might want to have a common intermediate form, and that is that the customers for one's product might not have the compiler you want. If you have Fortran/Pascal/C/Modula/.. compilers sharing a common intermediate representation and back end, then the customer only needs to have the back end to install a CIR distribution, but with encrypted source he needs each compiler.
henry@utzoo.uucp (Henry Spencer) (09/09/88)
In article <1076@cs.Buffalo.EDU> sher@wolf.UUCP (David Sher) writes: >Out of curiosity since I've not seen this question asked, if the problem >is that source code is too easy to play with, why not distribute encoded >source... Could be done, and I think it is being done by some groups. It is a weaker form of protection, though, because you can recover full source by decrypting, and the compilers have to know how to do that. Anything that the compiler knows how to do can, in principle, be analyzed and duplicated by a sufficiently determined programmer with a disassembler. Especially if he can then sell his decrypter as a commercial product. Look at how copy-protection schemes on PCs have fared. This is especially a problem if we're talking about a vendor-independent scheme, which of necessity has to be known to many people. The advantage of the "intermediate"-form approach is that quite a bit of information is lost during the transformation from source (e.g. one would presumably lose the spellings of most identifiers), and this information cannot be recovered just by being clever. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bpendlet@esunix.UUCP (Bob Pendleton) (09/15/88)
From article <1988Sep7.210317.5781@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > In article <963@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: > > Distribution with sources is Good. Distribution without sources is Evil. Oh, great... GOOD versus EVIL! Send me your definition of good and evil and how you feel about absolute moral values and I'll send you mine. Then we can debate the whole issue. But PLEASE do it off line. I've found that very few people are much interested in these kinds of debates. >>68020. But, you can make it run without error if those contraints are >>available to the target machine code generator... > > You miss my point. 36-bit ints can be dealt with in the software fairly > well. What I'm thinking of is much more subtle things that the compiler > can't easily discover and put in the intermediate form, e.g. "this program > depends on being able to dereference NULL pointers". Or, for that matter, > "the details of the arithmetic in this program assume that integers are > at least 36 bits". This is the sort of thing that will not be known to > the compiler unless the programmer is explicit about it -- and lack of > programmer attention to such details is exactly the problem. I've addressed this problem in another posting, but what the hey, I'll do it again. To be truely portable the intermediate form MUST address the issues you mention. Even if the source language doesn't define the semantics of dereferencing NULL pointers, the intermediate form must define the semantics of dereferencing NULL pointers. Otherwise, just as C code that counts on being able to dereference NULL pointers is not fully portable, an intermediate form that doesn't define the semantics of dereferencing NULL pointers will not be fully portable. No matter what the source language, a compiler that generates a portable intermediate form will have to explicitly state the assumptions it is making about things like word size, arithmetic, and the semantics of dereferencing NULL pointers, or conform to the semantics defined for these things in the portable intermediate form. Otherwise it just won't work. Yes, that means that C compilers will have to put information into the intermediate form that does not derive from any programmer provided declarations. That indicates a flaw in C, not a problem with the idea of a portable intermediate language. Bob P.-- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.
henry@utzoo.uucp (Henry Spencer) (09/18/88)
In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >> Distribution with sources is Good. Distribution without sources is Evil. > >Oh, great... GOOD versus EVIL! Send me your definition of good and >evil and how you feel about absolute moral values and I'll send you >mine. Then we can debate the whole issue. But PLEASE do it off line. You miss the point; these are not my beliefs, but Richard Stallman's, and by extension, those of the Gnu project. This is why I'd expect them to be very unenthusiastic about anything that would facilitate sourceless distribution. We are indeed talking about absolute moral values, not mere considerations of tactics. >> ... What I'm thinking of is much more subtle things that the compiler >> can't easily discover and put in the intermediate form, e.g. "this program >> depends on being able to dereference NULL pointers". Or, for that matter, >> "the details of the arithmetic in this program assume that integers are >> at least 36 bits"... > >... To be truely portable the intermediate form MUST address >the issues you mention. Even if the source language doesn't define the >semantics of dereferencing NULL pointers, the intermediate form must >define the semantics of dereferencing NULL pointers. Unfortunately, it *can't*, without being machine-specific. Some machines allow it; some do not. If the intermediate form allows dereferencing NULL, then the intermediate form's pointer-dereference operation is inherently expensive on machines which do not permit dereferencing NULL, making it impossible to generate good code from the intermediate form. If the intermediate form forbids it, then the compilers must guarantee that no program will try to do so... which for normal compilers will boil down to inserting run-time checks, again making efficient code impossible. This is an inherently unportable issue, which an intermediate form MUST NOT try to resolve if it is to be both efficient and portable. >Yes, that means that C compilers will have to put information into the >intermediate form that does not derive from any programmer provided >declarations. That indicates a flaw in C, not a problem with the idea >of a portable intermediate language. This is like saying that the impossibility of reaching the Moon with a balloon indicates a flaw in the position of the Moon, not a problem with the idea of using balloons for space travel! All of a sudden, our universal intermediate form is useless for most of today's programming languages, unless the compilers are far more sophisticated than current ones. (NULL pointers are a C-ism, but deducing the size of integers that the program's arithmetic needs is a problem for most languages.) I assumed that we were talking about *practical* portable intermediate forms, ones that could be used with current languages and current compiler technology. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
itcp@ist.CO.UK (News reading a/c for itcp) (09/21/88)
I have felt in my bones that an efficient Intermediate Language for conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC architectures) is realistic proposition. This discussion has encouraged me to think that I am not alone (and thus less likely to be wrong?). As people have noted it has to have something like the functionality of C, only with extensions to allow (where the source language required it) the specific semantics of a data type (storage size and address alignment) and operation (precision of operation). Use of these specifications may reduce performance on some architectures so the IL includes unspecified versions where the semantics are `as loose as possible' to allow local optimisations (block move, particular integer length, etc.). The code generator for this IL is larger and more complex than for C. It may not be possible to support an architecture not in the list above efficiently - to try (in some standards committee composed of any processor manufacturer who wanted in) would doom the project. But, I would like to see this not for Software Distribution but for the development of Programming Languages. The definition of such an IL and the wide availability of code generators would do for Programming Languages what UNIX and C did for Processor Architectures. It provides a portablility route that drastically reduces the time and costs of getting to market. By concentrating code generation in a single place it should also allow advances in code optimisation techniques and processor architecture design (unless the IL is not general enough and ends up constraining it :-( ). Good though this would be for the advancement of Computer Science I cannot see it being commercial. That is, I could not imagine a Company that produces the IL definition and sufficient code generators and compiler front ends to establish a momentum making a profit. :-(. Maybe Stallman and the FSF could do it, how they pay the rent beats me. For software distribution I think Doug Keppel has the most pragmatic and cost effective solution - the use of obfuscated C source as the IL: From article <5655@june.cs.washington.edu>, by pardo@june.cs.washington.edu (David Keppel): >>[ High-level IR (intermediate representation) needed for distribution ] >>[ The customer can modify C too easily ] > > I'll claim that I can pretty easily write a program that takes > ordinary C programs and makes them gosh-almighty hard to understand. > [example follows] > [Usual disclaimer: this represents only my hastily assembled opinion and spelling, and not necessarily anyonelse's] Tom (itcp@uk.co.ist)
shankar@hpclscu.HP.COM (Shankar Unni) (09/24/88)
> I have felt in my bones that an efficient Intermediate Language for > conventional processors [(examples)] is realistic proposition.... > > [Description of IL..] > Such a piece of research was done years ago at Stanford ("Ucode". Exact reference not available at the moment). > Good though this would be for the advancement of Computer Science I > cannot see it being commercial. That is, I could not imagine a > Company that produces the IL definition and sufficient code generators > and compiler front ends to establish a momentum making a profit. :-(. Well, surprise, surprise. Both HP and MIPS use such an intermediate language for their RISC processors. And at least one of them is making a profit :-). (Disclaimer: I have no information on the other. No flames..) So there! -- Shankar.
bpendlet@esunix.UUCP (Bob Pendleton) (09/24/88)
>In article <970@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >>> Distribution with sources is Good. Distribution without sources is Evil. >> >>> ... What I'm thinking of is much more subtle things that the compiler >>> can't easily discover and put in the intermediate form, e.g. "this program >>> depends on being able to dereference NULL pointers". Or, for that matter, >>> "the details of the arithmetic in this program assume that integers are >>> at least 36 bits"... >> >>... To be truly portable the intermediate form MUST address >>the issues you mention. Even if the source language doesn't define the >>semantics of dereferencing NULL pointers, the intermediate form must >>define the semantics of dereferencing NULL pointers. > >Unfortunately, it *can't*, without being machine-specific. Try this scenario: There are two kinds of computers in the world, brand X and brand Y. Brand X computers define the value pointed to by a NULL pointer to be a NULL value. That is, the load indirect instruction given the value that C uses for NULL is guaranteed to return NULL. On the other hand brand Y computers core dump if you try to load a value from the address that is equivalent to NULL. In all other respects X and Y computers are similar enough in word size, data formats, and so on, that software that doesn't dereference NULL ports easily from one brand of machine to the other. Let's assume that on both brands of computers people want code to run as fast as possible. So, the native code generators for the machines will generate the shortest possible code sequence for dereferencing a pointer. Of course they don't want to do run time checking to see if a NULL pointer is being dereferenced if they don't have to. A programmer uses brand X computers. He writes a pointer chasing program that assumes that *NULL == NULL. He's using a compiler suite that generates code in UIF (Universal Intermediate Form). Now he distributes the UIF to people with both brand X and brand Y computers. They run it through their UIF to machine code translators and run the code. What Happens? Well that depends on the definition of UIF. If UIF ignores the *NULL problem then the code will run on brand X computers and bomb on brand Y computers. But, if UIF allows a compiler to put a flag in the UIF that says that *NULL == NULL, or if UIF defines *NULL == NULL, then the code will run on brand Y machines, but with a speed penalty caused by the run time checks that the code generator had to insert to comply with the the brand X compilers request that *NULL == NULL. So, the compiler that runs on brand X machines must, at least, put a flag in the UIF stating that dereferencing NULL is allowed. The compiler on brand Y machines should state that dereferencing NULL is not allowed. That way the code can be made to run on any machine, though with a preformance hit when the original compilers assumptions don't match the reality of a specific machine. Obviously the compilers and code generators for brand X machines are going to be set up to produce good code for brand X computers and the same is true for brand Y computers. But, it is still possible for UIF code generated for one machine to be translated to be run on the other machine. So, to restate what I've said so many times (am I getting boring yet?): UIF must, at the very least, require that machine dependent assumptions be stated in the UIF form of a program. If the assumptions made by the original compiler and the target machine are a close match then the program will run efficiently on the target machine. If the assumptions don't match then the program will still run, it just won't run an as fast as it might have. This means that the UIF is not machine specific, but programs that make machine specific assumptions will pay a penalty when they are run on machines that don't support their assumptions. >If the intermediate form allows dereferencing NULL, then the >intermediate form's pointer-dereference operation is inherently >expensive on machines which do not permit dereferencing NULL, making >it impossible to generate good code from the intermediate form. It would seem that our definitions of "good code" are very different. My definition requires that the code do what I said to do. As I've tried to point out, not everything I say in a program is explicit in the source code. Several critical declarations are made by default based on the computer I'm using, the compiler I'm using, and the operating system I'm using. A complete set of declarations for a program includes all these things. For a compiler to generate code that matches the complete declaration of a program on a machine other than the one it was designed for may require that code sequences be generated that slow the program down. That's engineering folks, but it isn't impossible. By my definition, it's even good. I would prefer that programmers not write code that do things like dereferencing NULL. But, if the language allows, I want to support it an make it portable. >>Yes, that means that C compilers will have to put information into the >>intermediate form that does not derive from any programmer provided >>declarations. That indicates a flaw in C, not a problem with the idea >>of a portable intermediate language. > >This is like saying that the impossibility of reaching the Moon with a >balloon indicates a flaw in the position of the Moon, not a problem with >the idea of using balloons for space travel! This is a very good example of the use of a false analogy to build a strawman argument. >All of a sudden, our >universal intermediate form is useless for most of today's programming >languages, unless the compilers are far more sophisticated than current >ones. (NULL pointers are a C-ism, but deducing the size of integers that >the program's arithmetic needs is a problem for most languages.) This is a good example of justifying a false conclusion with a false premise. I can't find any thing about requiring compilers to deduce number ranges in anything in my author_copy file. What I keep saying is that the compiler must explicitly state its ASSUMPTIONS in the UIF form of a program. If the compiler can deduce number ranges, then it would be nice if it passed that information along in the UIF. If the compiler assumes that NULL can be dereferenced, as it would on a computer with hardware that allows it, then the compiler must state that fact in the UIF it generates. >I assumed that we were talking about *practical* portable intermediate >forms, ones that could be used with current languages and current compiler >technology. An ad hominem attack on my credibility? Incredible! But I'll address it anyway. No, I've been talking about old languages like C, COBOL, BASIC, LISP FORTRAN, Pascal, and MODULA-2. I've worked on compilers or interpretors for, or in, all of these languages. These languages comprise a small subset of the off the wall languages I've used and/or implemented over the last 17 years. So I'm convinced I know a little something about them. Anyway, it's very hard to keep up with all the current languages being developed there are so many of them. :-) As for practical, I've already cited examples of commercial products that aren't far from using UIF already. One of the problems I think we've had with this entire exchange is that it has centered around C. C is not yet standardized, and because it was intended to be a systems programming language C has always tolerated machine dependent variations in the semantics of some of its operators. I believe the variation has been tolerated because it was believed to be justified by the resulting increase in speed. I believe Henry published a paper that showed that using better algorithms is much better than using nonportable hardware features. If this discussion had centered around COBOL or BASIC there would have been little to discuss because the standards for these languages already require source level declarations that solve most of the problems we have been discussing. In the long run I think that the kind of discipline that could result from the use of a UIF would be a very good thing. Bob P. -- Bob Pendleton @ Evans & Sutherland UUCP Address: {decvax,ucbvax,allegra}!decwrl!esunix!bpendlet Alternate: utah-cs!esunix!bpendlet I am solely responsible for what I say.
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/25/88)
In article <340@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes: : >I have felt in my bones that an efficient Intermediate Language for >conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC >architectures) is realistic proposition. This discussion has encouraged me : >As people have noted it has to have something like the functionality of >C, only with extensions to allow (where the source language required >it) the specific semantics of a data type (storage size and address >alignment) and operation (precision of operation). Use of these >specifications may reduce performance on some architectures so the IL : >Good though this would be for the advancement of Computer Science I >cannot see it being commercial. That is, I could not imagine a > > > > > > > > > > > > I know many people will argue with this, so, feel free to argue - but here goes anyway (Hugh LaMaster's $.02): Prediction: In 4-6 years vector microprocessors will be "conventional"- they will not have replaced current architectures, but they will be out there, and will be fairly cheap. Request: If anyone is actually contemplating creating such a fairly portable IL, please include linear arrays (of whatever your basic data types are) in your IL, so people can write conforming vectorizing front ends and vector generating code generators. The cost of including it in the IL is practically nil. It is trivial to generate code for a non-vector machine from vectorized IL, and, in fact, it makes certain optimizations much easier, so it will usually result in faster code on pipelined machines. Non-vectorizing front-ends will still work just fine on vector machines (the generated code will not be as fast as possible, of course). Also, please do not make assumptions about the size of addresses, or the interchangeability of integers, addresses, chars, or floating point, in your IL. It should not be necessary, and, a smart code generator will be able to optimize as necessary for specific architectures. Please note that CDC has presumably come up with an IL which is portable in the above sense: In order to solve the MxN problem for their machines, they decided to build a set of compilers around common, portable front ends (written in a common C-like (but not C) language), with vector constructs, and with the ability to target multiple back end machines with different integer and address sizes ( I believe all the target machines have 64 bit floating point, but I don't know if any assumptions are made there). Anyway, I don't know if the IL has been published. I am sure the compilers haven't been! (How many person-years of development?) Anyway, I think that this, and other examples, may be an existence proof. But there are many subtleties to defining a portable IL, and I don't think it is a trivial job. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/25/88)
In article <650004@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes: >Well, surprise, surprise. Both HP and MIPS use such an intermediate language >for their RISC processors. And at least one of them is making a profit :-). >(Disclaimer: I have no information on the other. No flames..) So there! I am not sure what Kuck and Associates uses in their vectorizer, or what Pacific Sierra Research uses, but there are other commercial products out there from third parties. Have the IL's for any of these ever been published? Would the IL itself be considered to be proprietary, or just the code which uses it? -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
cik@l.cc.purdue.edu (Herman Rubin) (09/25/88)
In article <15440@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: > In article <340@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes: > : > >I have felt in my bones that an efficient Intermediate Language for > >conventional processors (MC680xx, iAPX386, VAXen, NS32xxx and all RISC > >architectures) is realistic proposition. This discussion has encouraged me An intermediate language should exist, which should include everything that these machines and others can do. But we should realize that many, if not most, machine operations do not exist on many machines. > >As people have noted it has to have something like the functionality of > >C, only with extensions to allow (where the source language required > >it) the specific semantics of a data type (storage size and address > >alignment) and operation (precision of operation). The differences are even greater. There are operations which are hardware on some machines, and which are so clumsy, difficult, or expensive on others that any decision as to whether or not to use them should be highly machine specific. I know of no machine for which I would attempt to restrict a programmer to HLLs. .............. > I know many people will argue with this, so, feel free to argue - > but here goes anyway (Hugh LaMaster's $.02): > > Prediction: In 4-6 years vector microprocessors will be "conventional"- > they will not have replaced current architectures, but they will be out there, > and will be fairly cheap. I agree that vector microprocessors will be fairly cheap. But which type of architecture? I am familiar with several of them. I have used the CYBER 205, and it has useful instructions which are not vectorizable at all, or vectoriz- able only with difficulty and at considerable cost, on vector register machines. Or will we be using massive parallelism? Try procedures which are necessarily branched on vector or (even worse) parallel processors. Some of them can be reasonably done on stream machines, but they are likely to be difficult on vector register machines, and almost unworkable on SIMD machines. An IL should be highly expressible and with an easy-to-use (from the human standpoint) syntax. But if it is good, many of its features will be directly usable only on few machines. There seems to be more useful constructs, Hugh, than are in your philosophy. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
bcase@cup.portal.com (09/26/88)
>> I have felt in my bones that an efficient Intermediate Language for >> conventional processors [(examples)] is realistic proposition.... >Such a piece of research was done years ago at Stanford ("Ucode". >Exact reference not available at the moment). Well, Ucode doesn't really fit the needs here, and certainly "machine- independent representation for the distribution of application programs" wasn't the point behind Ucode. For something closer, but not quite there yet, see the work done at DECWRL: "The Mahler Experience: Using an Intermediate Language as the Machine Description," by David Wall & Michael Powell, WRL Research Report 87/1; this is just one piece of the great research done for the Titan/MultiTitan project. Note that one of the major benefits of a MIIL is that a manufacturer can release a new version of a machine and the poor users of the old machine won't have to throw away all their software if they want the features of the new machine. Notice that this is the promise and, by and large, the delivery of the current IBM PC and Mac lines, but the level of compatibility, at the processor instruction set, is too low. At least the Mac II lets you install display systems with impunity (and type in any combination up to six at once!) *without* having to manually install different drivers in every application, etc. etc. But note further that a MIIL can't be limited to just a standard for expressing application algorithms, it must also specify a great deal about the operating system (geeze, call it BIOS or TOOLBOX if you're a little insecure at this point). For the Mac line, this should not be a very hard thing to do; the IBM PC world is a little more cloudy. As an example of this, I was always pleased that I could take a binary program from 4.2 BSD and run it on Dec's Ultrix, most of the time. I know that doesn't exhibit a MIIL, but it does show what kind of operating system specification is needed. As someone said earlier in a posting, what we need for processor instruction sets is what UNIX provides for computer operating systems (please no flames about how *good* UNIX is; I am just trying to say that the idea of a standard *interface* is there). And, we don't have to have just *one* MIIL, why not have many? Then, if you want access to a certain application, you must have the compiler for the MIIL in which it is written. This allows more money to be charged for the more sohpisticated MIILs, thus satisfying the maketing types among us. And it also reflects the *fact* that no one MIIL will be sufficient for all time. Instead of embellinshing *one* MIIL forever, until it becomes a CISC, we can have one MIIL for simple, procedural languages, C, PASCAL, etc., one for Object oriented languages, one for ADA (for which we can chanrge MEGA BUCKS because the military will want it!!!), etc. etc. Also, if MIIL specifications are made public, we can all compete for the MIIL market by writing better (faster, smaller, etc.) MIIL compilers. A new market is waiting to be tapped! The existence of MIILs doesn't cut revenue, it increases it! Even with a MIIL for every area, there will still only be a few, and having a few compilers on your system is not a big deal (or won't be soon), and they can even be kept off-line if necessary (unless the compilation is done on-the-fly). Note that with the right metaphores (such as that on the Mac, "double clicking"), the operating system can discover that the application hasn't been compiled from MIIL to native code and do that automatically. "Please wait: installing application. XX seconds 'till installation complete." *THIS* is the way to do it. *This* is the way computers should work. Whenever I tell a layperson (but computer user) that I have been working on a way to let new computers run old software, they ask why it hasn't always been that way.... Think of software for computers as gasoline for automobiles and you understand why the layperson is mad that IBM PCs can't run Apple software! What would you think if Arco gasoline only worked in economy cars!
itcp@ist.CO.UK (News reading a/c for itcp) (09/26/88)
From article <650004@hpclscu.HP.COM>, by shankar@hpclscu.HP.COM (Shankar Unni): >>[I write ..] >> cannot see it being commercial. That is, I could not imagine a >> Company that produces the IL definition and sufficient code generators >> and compiler front ends to establish a momentum making a profit. :-(. > > Well, surprise, surprise. Both HP and MIPS use such an intermediate language > for their RISC processors. And at least one of them is making a profit :-). > (Disclaimer: I have no information on the other. No flames..) So there! > -- > Shankar. 1. What I am interested in is a many (languages) to many (processors) Intermediate Language, and one made public for use by procesor or language designers. 2. Both MIPS and HP are basically hardware vendors, sure they may make a profit, but on their compiler operation? Tom
pb@nascom.UUCP (Peter Bergh) (09/26/88)
To contribute further to ethe existence proffof, Sperry (now part of Unisys) have develoipped a set of compilers that (when I last was involved) comprised C, cCobol, Fortran (for two architectures), Padscal, and Plus (a Sperry systems-programming language) for the Sperruy Univadcc 1100 series and that used a reasonably prtortable intermediate language. The main design goal for the intermediate language, though, was not to make it portable between machine architectures but to make it handle a large subset of the currently existing languages (it handles PL/I but not all of Ada).
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (09/26/88)
In article <944@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >An intermediate language should exist, which should include everything that >these machines and others can do. But we should realize that many, if not >most, machine operations do not exist on many machines. > >I agree that vector microprocessors will be fairly cheap. But which type of >architecture? I am familiar with several of them. I have used the CYBER 205, >and it has useful instructions which are not vectorizable at all, or vectoriz- >able only with difficulty and at considerable cost, on vector register >machines. Or will we be using massive parallelism? Try procedures which : >machines. An IL should be highly expressible and with an easy-to-use (from >the human standpoint) syntax. But if it is good, many of its features will >be directly usable only on few machines. There seems to be more useful >constructs, Hugh, than are in your philosophy. > > > > > > > I fear that I may have been misunderstood. I do not think that a portable IL (PIL) can be developed which can efficiently use all the features of a given architecture, very especially new, poorly understood architectures that involve massive parallelism with limited communication between processors. My point is that portable IL's are already in use, both explicitly and implicitly, and that "vectors" could be simply included in a new IL, and that it would be worth doing. No current IL can optimally mediate between the source language and a particular architecture, and yet, they are useful because they do a good enough job in many circumstances, and they make easier porting compilers, especially lesser used compilers that might never become available at all, to new architectures. Many people are using gcc, not because it produces optimal code for the VAX, but because the code it produces is good enough, and some compilers have become available to people through it, which would not be otherwise available. To carry the question about vectors further, it should not be necessary to know whether the machine has vector registers or a memory to memory architecture. It would simply represent vector operations as memory to memory operations, leaving register assignments to the code generator. It is true that some architectures would not be well used by such a scheme, but my guess is that you could get 30% of the performance of a machine specific compiler this way, and that would be good enough in many cases, and a significant improvement over the current situation, where portable compilers get a 0% improvement over scalar code. This is not an idealistic pursuit of the ideal IL, but a practical approach to solving the time/time tradeoff (how much programmer time can I afford to spend to get how much speedup of my program?) in the near term vector capable microprocessor world. BTW, the old CDC/ETA compiler did not detect the case of finding the maximal element of a vector and returning its index as one operation (a common operation - the instruction is there to support it) and instead used two vector operations, taking twice as long. It is not trivial to make optimal use of an architecture even if you don't have a PIL to worry about; this is, of course, why the "RISC" word appeared somewhere in this discussion. Since one of the tenets of the RISC philosophy is to usually exclude instructions which can't be easily generated by a compiler, RISC architectures tend to make PILs more practicable. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
brooks@maddog.llnl.gov (Eugene Brooks) (09/27/88)
In article <944@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >I agree that vector microprocessors will be fairly cheap. But which type of >architecture? I am familiar with several of them. I have used the CYBER 205, >and it has useful instructions which are not vectorizable at all, or vectoriz- >able only with difficulty and at considerable cost, on vector register >machines. Or will we be using massive parallelism? Try procedures which Just what instructions are we talking about here? Lets pick an alternative to compare to, say the CRAY XMP 48 instruction set.
rogerk@mips.COM (Roger B.A. Klorese) (09/28/88)
In article <345@istop.ist.CO.UK> itcp@ist.CO.UK (News reading a/c for itcp) writes: >1. What I am interested in is a many (languages) to many (processors) > Intermediate Language, and one made public for use by procesor or language > designers. Our UCODE predates our processor, is based on a theoretical machine which is architecturally unlike our processors, and, while currently processor specific in implementation, need not continue to be. >2. Both MIPS and HP are basically hardware vendors, sure they may make a > profit, but on their compiler operation? MIPS is a systems and technology company. We license our compilers to several vendors who use our chips to build their own systems. (In fact, since we now sell our chips through technology partners, royalties and compiler licenses constitute our revenue in some of these deals.) In this case, yes, we do make money on our compilers. -- Roger B.A. Klorese MIPS Computer Systems, Inc. {ames,decwrl,prls,pyramid}!mips!rogerk 25 Burlington Mall Rd, Suite 300 rogerk@mips.COM (rogerk%mips.COM@ames.arc.nasa.gov) Burlington, MA 01803 I don't think we're in toto any more, Kansas... +1 617 270-0613
prl@iis.UUCP (Peter Lamb) (09/29/88)
In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: > >Try this scenario: > >There are two kinds of computers in the world, brand X and brand Y. >Brand X computers define the value pointed to by a NULL pointer to be >a NULL value. That is, the load indirect instruction given the value >that C uses for NULL is guaranteed to return NULL. On the other hand >brand Y computers core dump if you try to load a value from the >address that is equivalent to NULL. Actually there are 4 types; add to the above two types a machine which returns constant garbage when you dereference NULL, and one which returns whatever you last wrote into location NULL. Unix also runs on machines like these... > >In all other respects X and Y computers are similar enough in word >size, data formats, and so on, that software that doesn't dereference >NULL ports easily from one brand of machine to the other. That is to say, correct software ports correctly. > .... >A programmer uses brand X computers. He writes a pointer chasing >program that assumes that *NULL == NULL. He's written incorrect code (in C or similar languages). >So, the compiler that runs on brand X machines must, at least, put a >flag in the UIF stating that dereferencing NULL is allowed. The >compiler on brand Y machines should state that dereferencing NULL is >not allowed. That way the code can be made to run on any machine, >though with a preformance hit when the original compilers assumptions >don't match the reality of a specific machine. Obviously the compilers >and code generators for brand X machines are going to be set up to >produce good code for brand X computers and the same is true for brand >Y computers. But, it is still possible for UIF code generated for one >machine to be translated to be run on the other machine. > *HOW* are you going to manage this? Even on the VAX, the classical trap machine for *0 programmers, *0==0 is only true for a few special cases: *(char*)0 == 0 *(short*)0 == 0 *BUT* *(int*)0 == 1041305344 !!!! (try it...) *AND* *(float*)0 == 1.5807e-30 and all bets are off for the case ((my_struct*)0)->element_in_my_struct So any pointer chasing code which depends on *0==0 is going to be highly non-portable at best, and will probably break even on machines like the VAX. It is typically code like strcmp("something", (char*)0); which will work on a VAX, but crash on a Sun (or any other machine which doesn't map in page 0). There is, as far as I can see no general solution to this problem. I seem to remember that K&R say (roughly, I don't have it to hand) that 0 does not correspond to *ANY* valid data. >So, to restate what I've said so many times (am I getting boring yet?): Well, quite frankly, this dereferencing NULL business comes up far too often on the net. >It would seem that our definitions of "good code" are very different. >My definition requires that the code do what I said to do. As I've Just *what* are you saying when you dereference NULL? Are you saying give me 0 (and if so, how much 0), are you saying give me whatever random constant garbage happens to be at address 0, are you saying give me whatever I wrote at address 0 (and yes, such systems exist, running Unix), or are you saying `I really feel like a core dump now'??? >One of the problems I think we've had with this entire exchange is >that it has centered around C. C is not yet standardized, and because >it was intended to be a systems programming language C has always >tolerated machine dependent variations in the semantics of some of its >operators. I believe the variation has been tolerated because it was >believed to be justified by the resulting increase in speed. I believe This is exactly what I mean. What's at zero is *UNDEFINED* in C, and explicitly illegal in many other languages. >Bob Pendleton @ Evans & Sutherland -- Peter Lamb uucp: seismo!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241 Institute for Integrated Systems ETH-Zentrum, 8092 Zurich
henry@utzoo.uucp (Henry Spencer) (09/30/88)
In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >>>... To be truly portable the intermediate form MUST address >>>the issues you mention. Even if the source language doesn't define the >>>semantics of dereferencing NULL pointers, the intermediate form must >>>define the semantics of dereferencing NULL pointers. >> >>Unfortunately, it *can't*, without being machine-specific. > >... if UIF allows a compiler to put a flag in the UIF >that says that *NULL == NULL, or if UIF defines *NULL == NULL, then >the code will run on brand Y machines, but with a speed penalty caused >by the run time checks that the code generator had to insert to comply >with the the brand X compilers request that *NULL == NULL. Right. In other words, what we have done is to redefine the semantics of C to allow *NULL. Thus guaranteeing that all programs with this flag in the UIF will be at a serious performance disadvantage on machines that don't allow *NULL. The semantics of *NULL are inherently and incurably machine-dependent, and any "universal" intermediate format file which specifies them is machine-dependent. >I can't find any thing about requiring compilers to deduce number >ranges in anything in my author_copy file. What I keep saying is that >the compiler must explicitly state its ASSUMPTIONS in the UIF form of >a program... How does the target machine's translator know whether it can do the arithmetic that the program wants? This cannot be stated in the UIF unless the compiler can figure it out. It can't simply be based on the compiler's host, because then a program which *doesn't* require the full range of the host's arithmetic (think of a 32-bit host and a 16-bit target, and a program which is careful not to depend on 32-bit numbers) again takes a massive efficiency hit for no good reason. It is a property of the *program*, not the host it is compiled on, whether it requires 32-bit arithmetic, the ability to dereference NULL pointers, etc. It is difficult to deduce these things from the program, unfortunately. Modifying the program is not the answer, because there is a massive payoff for being able to use this technology on existing programs. Accepting the efficiency hits is not the answer, because there is another massive payoff for not losing efficiency. >... I believe >Henry published a paper that showed that using better algorithms is >much better than using nonportable hardware features. Geoff Collyer and I did indeed publish such a paper. However, all the effort on better algorithms is for naught if you cannot get efficient code out of the compiler. The *programmer* should not have to worry about the details of how that is done, but it is important that it be done. That is, just because I compiled something on a *NULL machine to be run on a non-*NULL machine should not mean that I take an efficiency hit every time I use a pointer -- because I'm careful to avoid needing *NULL, even though it is difficult for the compiler to know this. Note, I am not saying that it is inherently evil to accept some efficiency loss for the sake of correct functioning. What I am saying is that people who guarantee correct functioning by their own efforts don't want to take that efficiency hit for no reason. And if we are talking about something that is supposed to sell, we cannot ignore the efficiency issue. One can make a fairly good argument that we would all be better off with a small efficiency loss for the sake of correctness, but that is not the way the market thinks, and trying to re-educate the market is a really good way to go broke (if you are trying to do it for profit) or to be laughed at and ignored (even if you aren't). When I talk about the idea not being "practical", I don't mean it is technically ridiculous, I mean that it WON'T SELL -- people will not adopt it, so proposing it is pointless. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bpendlet@esunix.UUCP (Bob Pendleton) (10/04/88)
From article <634@eiger.iis.UUCP>, by prl@iis.UUCP (Peter Lamb): > In article <978@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >> >>Try this scenario: >> >>So, the compiler that runs on brand X machines must, at least, put a >>flag in the UIF stating that dereferencing NULL is allowed. ... > > *HOW* are you going to manage this? Run time checks. How else do you check for illegal operations at run time? They can be implemented in hardware or software, I don't care which. > There is, as far as I can see no general solution to this problem. > I seem to remember that K&R say (roughly, I don't have it to hand) > that 0 does not correspond to *ANY* valid data. Run time checks aren't a general solution? They aren't even very expensive. At least not when compared to the alternative of buggy nonportable code. > Just *what* are you saying when you dereference NULL? Don't ask me, ask the language definition. If it isn't defined then you've found a flaw in the language definition. That applies to any language. If the language defines it (any feature), you have to implement so that it conforms to thae language specification. If the language leaves it undefined, then you have to deal with the fact that it will be used, and missued, in every possible way. I've used dialects of LISP in which (car NIL) was eq NIL and (cdr NIL) was eq NIL, and NIL, as a bit pattern, was not 0. Is it possible that you think I'm in favor of defining *NULL to be equal to NULL and are responding to that? I'm in favor of defining the behavior of every operator in a language on all of its operand set. Since NULL can be stored in a pointer, the actions of all pointer operators when applied to NULL should, in my opinion, be defined. > >>Bob Pendleton @ Evans & Sutherland > > > -- > Peter Lamb > uucp: seismo!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241 > Institute for Integrated Systems > ETH-Zentrum, 8092 Zurich
djs@actnyc.UUCP (Dave Seward) (10/07/88)
In article <1988Sep29.192410.246@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>ranges in anything in my author_copy file. What I keep saying is that >>the compiler must explicitly state its ASSUMPTIONS in the UIF form of >>a program... > >How does the target machine's translator know whether it can do the >arithmetic that the program wants? This cannot be stated in the UIF >unless the compiler can figure it out. Or unless the programmer has a way of informing the compiler. My take is that it is not worth supporting the likes of *NULL == NULL if it can't be done effectively on all target machines, but the concept of the programmer stating his assumptions about each of a class of variably implemented features (arithmetic bit sizes, value of *NULL, et al) is a valuable one, and puts the onus on the programmer, who in this case knows explicitly that he is writing supposedly portable code. Safe or reasonable values can be assumed for these variable features, and the code generator for each target can warn about cases where it can't implement the desired option, or can't do it efficiently. It may even be reasonable for the format to contain different code (provided by the programmer) for specific critical sections, one for each variant of a variably defined feature, allowing the most efficient use of each kind of machine (for that feature). The code generator would then select the appropriate one for the target machine. An additional thought about correctness: such a portable program should be delivered with a set of verification tests so that one doesn't have to find out with one's own data and effort that the program has a machine dependency in it that prevents it from working (for some feature) on your machine. This would quickly be enforced by market dynamics after 1) several people get burned by broken programs, and 2) some vendors start to deliver test suites. Dave Seward uunet!actnyc!djs
bpendlet@esunix.UUCP (Bob Pendleton) (10/07/88)
From article <1988Sep29.192410.246@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > > Right. In other words, what we have done is to redefine the semantics > of C to allow *NULL. Thus guaranteeing that all programs with this flag > in the UIF will be at a serious performance disadvantage on machines > that don't allow *NULL. The semantics of *NULL are inherently and > incurably machine-dependent, and any "universal" intermediate format > file which specifies them is machine-dependent. I thought the semantics of *NULL were implementation dependent. Note, I did not say machine specific. There is a difference. I was trying to show how implementation specific decisions can be passed on in a portable way and be made to work, even on machines where they don't make sense. The idea is, after all, to provide portability. Efficiency is, of course, a critical issue. If no one cared how long something takes, computer development would never have started. Take a look in the "Proceedings of the SIGPLAN '84 Symposium on Compiler Construction" entitled "A Portable Optimizing Compiler for Modula-2" page 310, by Michael L. Powell, who was at the time at DECWRL. I've included some relevant quotations from the paper. "3.3 Optimizing Checks "Runtime checks are often disabled in production programs because they "cost so much. For example, the P-code translator, written in Berkely "Pascal, runs 3 times slower when runtime checks are enabled. By "optimizing runtime checking, its benefits can be obtained at a "fraction of the usual cost. "The runtime checks performed by the compiler include checking variant "tags, subranges, subscripts, and pointers. The pointer check catches "not only bad addresses, but also pointers to objects that have been "disposed. Checks are entered into the expression tree like any other "expressions, appearing to be unary operators. These expressions are "often common subexpressions or loop invariants. Such expressions are "also eligible for loop induction, which could replace a subscript "check in a loop by checks of the lower and upper bounds of the loop "index. The following table is made from information contained in two tables in the paper. The compilers being used are Berkeley Pascal (pc), the Berkely Unix C compiler, the DEC VMS C compiler, and the Powell Modula-2 compiler. All times are in VAX 11/780 cpu seconds. Opt Check All Opt Program Berkeley UNIX DEC DEC DEC name Pascal C C Mod-2 mod-2 perm 2.7 2.6 2.5 2.0 2.4 Towers 2.8 2.6 2.7 1.9 2.6 Queens 1.6 1.0 0.7 0.9 1.3 Intmm 2.2 1.7 0.8 0.8 1.1 Mm 2.7 2.2 1.3 0.9 1.2 Puzzle 12.9 12.4 4.9 4.1 6.5 Quick 1.7 1.2 0.8 0.8 1.2 Bubble 3.0 1.7 1.0 1.0 1.9 Tree 6.4 6.2 3.4 1.9 2.2 FFT 4.8 4.1 2.6 1.6 2.0 The first 3 columns give execution times with all available optimization turned on. The 4th column gives execution times for code generated with all optimizations turned on and all checking turned off. The 5th column gives execution times with all optimizations turned on and all runtime checking turned on. Comparing columns 4 and 5 we see that runtime checking can increase runtime by as much as %50 for this Modula-2 compiler. But, with checking turned on it generates code that is as much as twice as fast as code compiled with the Berkeley C compiler. Notice that the compiler is doing a lot more than just checking pointers for equality to NULL. This table certainly shows the cost of full runtime checking. I hope the Berkeley C compiler has been improved during the last 4 years. I'd hate to think we were worrying about the cost of runtime checks when just using a good compiler can get you back 4 or 5 times what you loose to run time checks. Going on to the topic of a Machine Independed Intermediate Language, the paper has this to say: "Our P-code is a dialect of the P-code originally developed for Pascal "compilers [Nori et al. 73]. P-code looks like machine language for a "hypothetical stack machine and has been used successfully for "portable compilers. For example, the Model programming language "[Johnson and "Morris 76], which generates P-code, runs on the Cray-1, "DEC VAX, Zilog Z8000, and Motorola MX68000 computers. The principle "features that distinguish this version of P-code from others are "support for multiple classes of memory and specification of types and "bit sizes on all operations. ... "The P-code translator is a one-pass compiler of P-code into VAX code. "It performs the compilation by doing a static interpretation of the "P-code. ... I've used this technique myself. It works very nicely. I first saw it described in the BCPL porting guide, which I read in ~75, I don't know when it was written. It is an old and well understood technique. "Although P-code is machine independent, the P-code translator is "inherently machine dependent. Decisions of what registers and "instructions to use to implement a particular P-code operation are "left entirely to it. However, many of the strategies span a wide class "of computers, in particular, register-oriented ones. thus the global "structure of the P-code translator and many of its strategies are "common to all the implemenations, adding a degree of machine "independence. I hope that the existence proofs that others have posted, plus this information convinces you that the concepts behind MIILs have been known, and in use for many years, that MIILs can be used as part of an optimizing compiler system, and that there need not be any performance loss as a result of using one. > How does the target machine's translator know whether it can do the > arithmetic that the program wants? This cannot be stated in the UIF > unless the compiler can figure it out. It can't simply be based on the > compiler's host, because then a program which *doesn't* require the full > range of the host's arithmetic (think of a 32-bit host and a 16-bit > target, and a program which is careful not to depend on 32-bit numbers) > again takes a massive efficiency hit for no good reason. > > It is a property of the *program*, not the host it is compiled on, whether > it requires 32-bit arithmetic, the ability to dereference NULL pointers, > etc. It is difficult to deduce these things from the program, unfortunately. > Modifying the program is not the answer, because there is a massive payoff > for being able to use this technology on existing programs. Accepting the > efficiency hits is not the answer, because there is another massive payoff > for not losing efficiency. Yes, it is a property of the program. But, if the language doesn't allow you to declare the actual size of the data you are doing arithmetic on, if the language doesn't define the semantics of pointer operations, then where am I going to get the information needed to make these decisions? We could have compiler options that let you tell the compiler things you can say in the language. We could forced to compile by saying something like: cc -short=16 -long=32 -catch_null to define the size of the arithmetic and how to handle dereferencing null pointers. Or, the compiler can get it from the host the program is compiled on. Or, we can modify the definition of the langauge so that the size of an int and what it means to dereference NULL are explicitly stated. Or, we could use pragmas to supply the information. No matter what mechanism is used, the information must be provided if a program is to have any chance of being automagically portable from one machine to another. You mention the case of a program that has been designed so that it can be run efficiently on a machine with small ints (say 16 bits). If the program is developed on a machine with large ints (say 32 bits), how does the programmer really know if it will work using small ints without testing it on a machine with small ints? The only practical way that I can think of is to use a compiler that allows you to tell it how big an int is and that generates runtime checks to make sure that small int semantics are enforced. Testing capabilities of this sort would allow you to safely put the contstraint that ints must be >= 16 bits long into the MIIL for the program. The same thing goes for the *NULL problem. Test with runtime checks for references to *NULL. Then you can put the assertion that NULL is not derefrenced into the MIIL. Personally, I'd keep the runtime checks. Midnight calls from customers are bad enough, but having the program die without even printing a message that will help me get the customer running again are awful. > ... Much good stuff deleted > efficiency loss for the sake of correctness, but that is not the way the > market thinks, and trying to re-educate the market is a really good way > to go broke (if you are trying to do it for profit) or to be laughed at I can't resist. Look at who I work for and tell me I don't already know. :-) If that doesn't make any sense to you, look up Evans & Sutherland in "Fundamentals of Interactive Computer Graphics" by Foley and Van Dam. Especially the stuff about the PS300. Of course I could also tell you about the commercial success (I think 20 copies were sold) of my FORTH compiler that "fixed" everything I didn't like about FORTH. > When I talk about the idea not being > "practical", I don't mean it is technically ridiculous, I mean that it > WON'T SELL -- people will not adopt it, so proposing it is pointless. It will sell if you push it hard enough. If the alternative is having to include IBM-PC/MS-DOS compatiblity as part of every machine you make, then I think the computer manufacturers will work very hard to make something like this sell. Consider; a new machine won't sell without a large existing base of applications. And, software developers can't afford to develop for machines that don't have a large installed base. A standard MIIL allows hardware vendors to compete on a price/performance basis and provides software vendors with a huge installed base of possible customers. So a software distribution standard looks like a win for hardware vendors, software vendors, and end users. A win win win situation isn't going to be passed up. Consider how hard it was to get people to stop laughing at the idea of an operating system written in a high level language only 10 years ago. The technology development that made that practical didn't stop. -- Bob Pendleton, speaking only for myself. An average hammer is better for driving nails than a superior wrench. When your only tool is a hammer, everything start looking like a nail. UUCP Address: decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet
greyham@ausonics.OZ (Greyham Stoney) (10/11/88)
in article <993@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says: > > Since NULL can be stored in a pointer, the actions of all pointer > operators when applied to NULL should, in my opinion, be defined. Hey.... this NULL pointer business is crazy; obviously (*NULL) is undefined - how could anyone use it? (No, I'm not saying you support it....). But if ALL actions when applied to the null pointer are to be defined, how about: (*(NULL+1))? or (*(NULL+any_old_number)). No way; it's totally machine dependant. Greyham Vote *NO* to NULL pointer references! -- # Greyham Stoney: (disclaimer not necessary: I'm obviously irresponsible) # greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove. (* Official Sponsor *)
henry@utzoo.uucp (Henry Spencer) (10/11/88)
In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >You mention the case of a program that has been designed so that it >can be run efficiently on a machine with small ints (say 16 bits). If >the program is developed on a machine with large ints (say 32 bits), >how does the programmer really know if it will work using small ints >without testing it on a machine with small ints? ... Competent programming by people who understand portability. We know this works, we do it. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
chip@ateng.ateng.com (Chip Salzenberg) (10/14/88)
According to henry@utzoo.uucp (Henry Spencer): >In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: >>If the program is developed on a machine with large ints (say 32 bits), >>how does the programmer really know if it will work using small ints >>without testing it on a machine with small ints? ... > >Competent programming by people who understand portability. We know >this works, we do it. Just a confirmation and a testimonial here. C News Alpha runs just fine on a '286, thanks much to Messrs. Spencer and Collyer. -- Chip Salzenberg <chip@ateng.com> or <uunet!ateng!chip> A T Engineering Me? Speak for my company? Surely you jest! Beware of programmers carrying screwdrivers.
henry@utzoo.uucp (Henry Spencer) (10/16/88)
In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes: >Just a confirmation and a testimonial here. C News Alpha runs just fine >on a '286, thanks much to Messrs. Spencer and Collyer. Dept of Minor Nits: Collyer and Spencer. Geoff did all the hard stuff. -- The meek can have the Earth; | Henry Spencer at U of Toronto Zoology the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bpendlet@esunix.UUCP (Bob Pendleton) (10/19/88)
From article <44@ausonics.OZ>, by greyham@ausonics.OZ (Greyham Stoney): - in article <993@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says: -- -- Since NULL can be stored in a pointer, the actions of all pointer -- operators when applied to NULL should, in my opinion, be defined. - - Hey.... this NULL pointer business is crazy; obviously (*NULL) is undefined - - how could anyone use it? (No, I'm not saying you support it....). But if ALL - actions when applied to the null pointer are to be defined, how about: - (*(NULL+1))? or (*(NULL+any_old_number)). No way; it's totally machine - dependant. The idea was to define all of these to be runtime exceptions. Not to make them meaningful. *NULL is about as meaningful as x/0, and both should, in my opinion, cause an exception. - - Greyham - - Vote *NO* to NULL pointer references! Absolutely! - -- - # Greyham Stoney: (disclaimer not necessary: I'm obviously irresponsible) - # greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove. (* Official Sponsor *) -- Bob Pendleton, speaking only for myself. An average hammer is better for driving nails than a superior wrench. When your only tool is a hammer, everything starts looking like a nail. UUCP Address: decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet
hermit@shockeye.UUCP (Mark Buda) (10/22/88)
In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes: |According to henry@utzoo.uucp (Henry Spencer): |>In article <997@esunix.UUCP> bpendlet@esunix.UUCP (Bob Pendleton) writes: |>>If the program is developed on a machine with large ints (say 32 bits), |>>how does the programmer really know if it will work using small ints |>>without testing it on a machine with small ints? ... |> |>Competent programming by people who understand portability. We know |>this works, we do it. | |Just a confirmation and a testimonial here. C News Alpha runs just fine |on a '286, thanks much to Messrs. Spencer and Collyer. GNU CC, however, doesn't. I expected more from them... sniff... -- Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189 Dumb UUCP: ...{rutgers,ihnp4,cbosgd}!bpa!vu-vlsi!devon!shockeye!hermit Entropy will get you in the end. "A little suction does wonders." - Gary Collins
ken@gatech.edu (Ken Seefried III) (10/24/88)
In article <222@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes: >|Just a confirmation and a testimonial here. C News Alpha runs just fine >|on a '286, thanks much to Messrs. Spencer and Collyer. > >GNU CC, however, doesn't. I expected more from them... sniff... >-- I'll be kind and simply call this kind of talk silly. The 80286 is an amazingly stupid design. the GNU group made some assumptions (most of them pretty reasonable) when the built gcc and its ilk. One of the biggies was 32-bits implimented in a semi-reasonable way. The 80286 is niether 32-bits nor reasonably implimented. Since the target audience for 'gcc' was 680x0, 32x32, etc. based, and the rest of the world is moving that direction, and they wanted to produce a high quality compiler, these requirments make a whole lot of sense. I cannot believe the unmitigated gall of some people ( 'I expected more...' ). 'gcc' will not run on the PDP-11/2 in my closet, nor will it run on the old Z80-CP/M machine that I use for a terminal, but then it was never ment to, so I tend not to bitch and moan. Moral: if you want to run real software, get real hardware... Oh, and please don't whine that its all that you can afford. I know that story inside and out (being a student, and having saved a whole bunch of pennies for my computer). >Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189 ...ken
dtynan@sultra.UUCP (Der Tynan) (10/25/88)
In article <222@shockeye.UUCP>, hermit@shockeye.UUCP (Mark Buda) writes: > In article <1988Oct13.202604.22464@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes: > | > |Just a confirmation and a testimonial here. C News Alpha runs just fine > |on a '286, thanks much to Messrs. Spencer and Collyer. > > GNU CC, however, doesn't. I expected more from them... sniff... > -- > Mark Buda / Smart UUCP: hermit@shockeye.uucp / Phone(work):(717)299-5189 Check out the GNU software philosphy. RMS clearly states, that when writing code for FSF, assume ints are 32 bits, and memory space >= 1MByte. Your expectations aside, they did what they said they'd do. - Der -- Reply: dtynan@sultra.UUCP (Der Tynan @ Tynan Computers) {mips,pyramid}!sultra!dtynan Cast a cold eye on life, on death. Horseman, pass by... [WBY]
greyham@ausonics.OZ (Greyham Stoney) (10/26/88)
in article <1019@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) says: > [stuff about what I said earlier] > The idea was to define all of these to be runtime exceptions. Not to > make them meaningful. *NULL is about as meaningful as x/0, and both > should, in my opinion, cause an exception. Well that's just not possible in many cases. Looks like it'l have to be DEFINED as being UNDEFINED. -- # Greyham Stoney: (disclaimer not necessary: I'm obviously irresponsible) # greyham@ausonics.oz - Ausonics Pty Ltd, Lane Cove. /* Official Sponsor */ # greyham@utscsd.oz - Uni of Technology, Sydney.
hermit@shockeye.UUCP (Mark Buda) (10/27/88)
In article <17536@gatech.edu> ken@gatech.UUCP (Ken Seefried iii) writes: #In article <222@shockeye.UUCP> hermit@shockeye.UUCP (Mark Buda) writes: #>|Just a confirmation and a testimonial here. C News Alpha runs just fine #>|on a '286, thanks much to Messrs. Spencer and Collyer. #> #>GNU CC, however, doesn't. I expected more from them... sniff... #>-- # #I'll be kind and simply call this kind of talk silly. The 80286 is an #amazingly stupid design. I agree wholeheartedly. The only semi-reasonable processor in the family is the 80386, and that's pretty bad too. #the GNU group made some assumptions (most of them pretty reasonable) #when the built gcc and its ilk. One of the biggies was 32-bits implemented #in a semi-reasonable way. The 80286 is niether 32-bits nor reasonably #implemented. Since the target audience for 'gcc' was 680x0, 32x32, etc. #based, and the rest of the world is moving that direction, and they wanted #to produce a high quality compiler, these requirments make a whole lot of #sense. I think the problem is that I didn't make something clear in my original posting. I don't want to compile *for* the 286. I want to compile for a 386, on a 386, but the compilers I have only understand 8086/286, and I'm damned if I'm going to spend hundreds of dollars for a compiler I'll only use once. #I cannot believe the unmitigated gall of some people ( 'I expected #more...' ). The only thing I object to in GNU CC is the attitude that you can put a pointer in an int or pass '0' for a null pointer where the portable thing is '(char *)0' or NULL. #Moral: if you want to run real software, get real hardware... # #Oh, and please don't whine that its all that you can afford. It's not mine. I'll keep my mouth shut from now on.
tron1@xanadu.UUCP (Kenneth Jamieson) (11/01/88)
It seems to me that there was something in that article about those slots being independantly something-or-othered. Powered , that was it. The machine looks clean, yet I wonder how well the propriatary monitor and stuff idea will catch on? Also, about its design. Ok, a 200+meg optical main drive is nice, but wont a low-storge (or at least cheap) device be needed? I mean, I dont know of any software houses that will wanna publish on 50$ disks with a word processor. -- ****************************************************************************** * All rumors about my death are true. {...}galaxy!dsoft \ * * Responsibility is management's word for blame. --- xanadu!tron1 * * "The world is GOD's source level debugger" {...}s4mjs! / *