jack@boring.UUCP (05/20/85)
[ Note that I added net.arch to the newsgroup, since this is probably where this discussion belongs] In article <557@terak.UUCP> doug@terak.UUCP (Doug Pardee) writes: >**** WARNING **** The following comments are not as nice as etiquette >recommends. Agreed. Also, I think they're not true. > >> I think total orthogonality would be *very* useful. >> ... >> A 68K compiler has to think about modifying the branch condition, etc. >> A 32K compiler just generates code in the way it sees the statement. >> >> Of course, an optimizer might throw everything around again >> to save registers or whatever, but the inital code generation is >> much simpler in the 32K case. > >What in heck do you think we users are paying you compiler writers to >DO? > >The purpose of a CPU is to solve the *user's* application as quickly as >possible. Agreed. In my opinion, this means that the CPU should be optimized to doing what most users do most of the time: running high-level language programs. > >The purpose of a CPU is *NOT* to be as easy to write a compiler for as >possible. Not agreed. If a machine is simple, the compiler is simpler, and thus it is available sooner, doesn't have as much bugs, etc. > >Why on earth should the design of a CPU be based on how easy it will >make the jobs of the five people who will write the compilers for it? Because *EVERYONE* will use the product of those five people. If, for instance, a compiler for a certain machine generates lousy code for a for-loop, because the compiler writers didn't have time to optimize it because they were too busy with getting the compiler to *work* that will waste *HOURS* of CPU time eventually for everyone using it. This is also the whole point behind RISC architecture, one of the rising stars at the moment. -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
doug@terak.UUCP (Doug Pardee) (05/22/85)
me>The purpose of a CPU is *NOT* to be as easy to write a compiler for as me>possible. > Not agreed. If a machine is simple, the compiler is simpler, and thus it > is available sooner, doesn't have as much bugs, etc. Did I miss something here? Since when is it any concern of mine, as a user, whether the compiler is simple??? And I have seen no evidence that compilers for "simple" machines are available any sooner, or are any more reliable, than compilers for warpo machines. me>Why on earth should the design of a CPU be based on how easy it will me>make the jobs of the five people who will write the compilers for it? One response: > Because *EVERYONE* will use the product of those five people. But that doesn't address the question as to why the comfort and convenience of those five people is of any concern to "*EVERYONE*". Another response: > If you have tried to hire good > compiler people lately, you know that compiler-writer time is neither > cheap nor in infinite supply. Ah, here we finally get to the nitty-gritty. What we're saying is that we want to have CPUs that are easy to write compilers for so that we can hire less-capable (aka *cheaper*) programmers to write the compilers!!! Given how few micro-processor instruction sets there are, and how few languages of interest, you don't *need* an "infinite supply" of compiler programmers. In fact, about a dozen could do the job for the entire microcomputer world. There are certainly a dozen top-notch compiler programmers available for this task. And given the importance of having good compilers, they're worth whatever they get paid. But CPUs and compilers are put out by IC manufacturers, and they understand chips better than software. So they tend to put their money into design work on the chip, and hire cheap programming labor to produce less-than-thrilling compilers. Since the manufacturers' compilers are often poor, third-party operations spring up all over the place to try to cash in. Typically underfinanced, these operations *also* hire cheap programming labor and produce less-than-thrilling compilers. And the vacuum remains, so even more third-party start-ups appear. For heaven's sake, how many C compilers do we have to develop for the 68000 before we get one that's good??? Wouldn't it have been a whole lot easier if Motorola or Microsoft or *someone* had put up the bucks necessary to hire real compiler writers in the first place? I think it makes more sense to take compiler-writing seriously, rather than try to kludge the CPU so that every basement hacker can write what he calls a "compiler". -- Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug ^^^^^--- soon to be CalComp
kds@intelca.UUCP (Ken Shoemaker) (05/23/85)
> > [ Note that I added net.arch to the newsgroup, since this is probably > where this discussion belongs] > > > > >The purpose of a CPU is *NOT* to be as easy to write a compiler for as > >possible. > Not agreed. If a machine is simple, the compiler is simpler, and thus it > is available sooner, doesn't have as much bugs, etc. > > This is also the whole point behind RISC architecture, one of the > rising stars at the moment. > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. Not entirely true. - the only instructions that can access memory are mov (or load) operations - jumps jump only after the instruction after the jump has been executed - some don't have hardware interlocks to prevent a register being read before a previous register write has completed, so you have to remember to do enough in between so you don't have problems. - they don't allow arbitrary byte boundaries for code/data You can argue that this is merely code reorganization, but they are implemented this way such that you can eliminate both hardware pipeline stages, and the delays in each stage that is there. Just my impressions... -- It looks so easy, but looks sometimes deceive... Ken Shoemaker, Intel, Santa Clara, Ca. {pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds ---the above views are personal. They may not represent those of Intel.
jack@boring.UUCP (05/25/85)
I'm not sure whether Doug Pardee is serious, or just trying to keep the discussion going. I'll assume he *is* serious, and answer him anyway. Doug>The purpose of a CPU is *NOT* to be as easy to write a compiler for as Doug>possible. me> Not agreed. If a machine is simple, the compiler is simpler, and thus it me> is available sooner, doesn't have as much bugs, etc. Doug>Did I miss something here? Since when is it any concern of mine, as a Doug>user, whether the compiler is simple??? Exactly for the reasons stated above. You don't want to argue that a 4000 line compiler is easier to maintain, debug, etc. than one of 8000 lines, I hope? Doug>And I have seen no evidence that compilers for "simple" machines are Doug>available any sooner, or are any more reliable, than compilers for Doug>warpo machines. No? Get yourself a PR1ME, and try the pascal compiler :-( Now, I won't comment on the rest point-by-point, since it would be too long-winded that way. Let me just explain the following point: When you are designing a machine, you are facing two size problems: 1. How do I fit all those transistors on this little square? 2. How do I fit all those opcodes in those 16 bits? An orthogonal design is clearly good for (1), since it allows you to put use the hardware (or firmware) for calculating "x'100(a6:B)[d0]" many times. Now, to satisfy (2), you can do two things: - Make the operand fields small, so you can have many opcodes. - Make the opcode fields small, so you can have complicated operands. (I won't go into RISC here, which makes both of them small). If you take the first choice, you can have lots of nifty instructions like 'search for a one bit, and return the position in a register' or 'copy a string and translate' and those kind of things, which will *never* be used by *any* compiler (except for cobol, maybe) since most high-level language don't have a construct for that. Can you imagine a compiler that would recognize for(p=src, q=dst; *p; p++, q++) *q = table[*p]; and translate it in the above mentioned instruction? If you take the second branch, you will *not* have a string translate instruction. You will, however, have the ability to make your design orthogonal. Wirth (I think, I'm not sure) has long ago measured that the average expression had 1.5 operands. This means the half of the instructions you give will be expressible in *one* instruction, provided that the machine lets you address something on the stack as an operand. For example: a += b; orthogonal: add b(r5),a(r5) non-orthogonal: mov a(r5),r4 <-- AND MAKE SURE IT'S FREE!! add b(r5),r3 mov r3,a(r5) Now, in cycles, the first one would result in 4 memory cylces and 3 additions, and the second in 6 memory cycles and 4 additions (PLUS an additional 2 instruction decodes). Well, this has got long-winded after all, sorry for that. You may do what you want, but I'll stick to hardware that was designed by software people. -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
cdshaw@watmum.UUCP (Chris Shaw) (05/27/85)
>The purpose of a CPU is *NOT* to be as easy to write a compiler for as >possible. > All right then, what IS the purpose of a CPU?? It would seem to me that the purpose of a CPU is to run programs. The purpose of a well-designed instruction set is to make it as easy to program as possible without sacri- ficing performance. Now, it also seems to me that an intelligent CPU design takes into account the types of programs that will run on it. Thus, it's obvious that the 8035 was never designed to be anything more than a controller. When designing the 32032 then, the kind of programs the designers of the chip had in mind were those that would be created by high-level languages. Thus, they made the instruction set as easy as possible to write compilers for. Obviously, orthogonality doesn't matter quite so much on a controller, where the programmer is a human, not a program. On a general-purpose CPU, however, most programs will be created by programs (compilers), so it makes sense to tailor the instruction set to its intended programmers. Anybody who has written a compiler will tell you that ortho machines are easier to write compilers for. It's a simple fact that has been true since day 1. The benefits of programs that are easy to write vs hard to write are as follows: 1) Productivity of the programmer is much higher. Despite Mr. Trissel's comments, compiler writers are harder to come by than (say) COBOL programmers, and are therfore more expensive. Simply asking for better programmers doesn't solve this problem. Therefore, the more productive your programmers, the better. Of course, if the market for 8035 C compilers is twice that for 68000 C compilers, then maybe start writing 8035 stuff, but that's another matter entirely. 2) Program correctness (lack of compiler bugs). All things being equal (which they aren't), a compiler for a weird machine produced from N man-months of labour will be generally less right than that for an ortho machine. This point is really an outgrowth of productivity. Almost as importantly, an ortho compiler will be easier to maintain and fix bugs for than for a non-ortho machine, since there is no complicated register-assignment algorithm, etc... 3) Object code speed. Given that a CPUs x and y have the same hardware, but different instruction sets (2 microcode sets, say), compiler code produced for the ortho version is most likely going to be faster, since special-purpose register decisions are not reflected in the code. In other words, non-orthogonality generates superfluous moves that would probably not be necessary in an ortho machine. This point is true whether the code is compiler or human produced. The lack of a general reg-to-reg add on the Z80 is cause for much wasted reg-to-reg MOVs, (or worse) reg-to-memory MOVs, for example. >I think it makes more sense to take compiler-writing seriously, rather >than try to kludge the CPU so that every basement hacker can write what >he calls a "compiler". >-- >Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug I think this point is ripe nonsense. The bit which grabs me worse, of course, is the twisted use of the word "kludge". And as for this garbage about basement hackers, well.... (I guess it's time to go upstairs for a beer & mellow out :-) Chris Shaw watmath!watmum!cdshaw or cdshaw@watmath University of Waterloo In doubt? Eat hot high-speed death -- the experts' choice in gastric vileness !
johnl@ima.UUCP (05/29/85)
/* Written 9:10 pm May 27, 1985 by g-frank@gumby in ima:net.micro.68k */ > Clever compilers for almost any language but C can paper over most sorts > of yawning chasms. Modula-2, Pascal, Ada, all are > languages that port quite well to the 8086 family, and produce efficient, > readable code without any sort of trickery required of the programmer. > I have a stupid 68000 system in my basement > that I can't use and can't sell because there's no software for it, and one of > those [Intel CPU] vending machines sitting on my desk. I wish I could get great performance out of my vending machine CPU merely by switching languages. Unfortunately, the last time I looked, Modula, Pascal, and Ada compilers didn't produce notably better code than did C compilers. Would it were true. In every case, if your total data is bigger than 64K, something gives. You always find limits, like the total automatic data for a procedure (or, often, the whole program) being less than 64K. I'm not talking about a single huge array -- it's just as bad if you have lots of small things all of which add up to more than 64K. The need to distinguish between long and short pointers in order to produce decent code always pops up in 8086 compilers somehow, either by not allowing long pointers, by generating poor large model code anywhere, or by putting some wart in the language that lets the programmer tell the compiler what's long and what's short. There were legitimate reasons for IBM to pick the 8088 for the PC. I gather that the main competitor at the time was the Z80, so we should be thankful for something, since 68008s weren't suffuciently available, and for price reasons they wanted to stick with an 8-bit bus. And I also recall that the original PC came with only 16K, and loading up the machine past 128K was a big deal. But none of that means that the 8088 or the 286 is at all easy to program for the sorts of things that people are doing on PCs now. It also doesn't mean that a chip that was designed to be spiritually compatible with the 8080 is much of a choice for a general computing engine. John Levine, ima!johnl PS: I hear that for applications with limited amounts of data and lots of real-time I/O requirements, such as controlling vending machines, the 8088 is just great.
thomson@uthub.UUCP (Brian Thomson) (05/29/85)
Chris Shaw writes about orthogonality: >When designing >the 32032 then, the kind of programs the designers of the chip had in mind >were those that would be created by high-level languages. Thus, they made the >instruction set as easy as possible to write compilers for. >On a general-purpose CPU, ... most programs will >be created by programs (compilers), so it makes sense to tailor the instruction >set to its intended programmers. In my experience, the difficulty of (decent) compiler construction is affected less by orthogonality than by the number of code sequences that must be considered when implementing a given source language construct. The C statement a = b * c + d + e; might, in different contexts be implemented on your 32032 as: movd _b,r0 muld _c,r0 addd _d,r0 addd _e,r0 movd r0,_a or, if c is the constant 2, d a stack local, and e the constant 4, movd _b,r0 addr 4(-4(fp))[r0:w],_a or even, if b, c, and d are all unsigned shorts, and e == b, movzwd _b,r0 indexw r0,_c,_d ; b * (c+1) + d movd r0,_a Does that last one look ridiculous? That's exactly my point: it's the best code sequence under the given set of assumptions, and no compiler would ever find it. If these fancy addressing modes and high-level language oriented instructions could be added without penalizing the performance of bread-and-butter instructions, I'd be all for it, but such is never the case. If a machine forces me to put something in a data register before I can add to it, and has no exceptions to this rule, it will be easy to generate code. It only gets tough when there are options. -- Brian Thomson, CSRI Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson
doug@terak.UUCP (Doug Pardee) (06/03/85)
Wait a second! It looks like I should have used one of my "patented" 200-line postings, because an awful lot of people have misinterpreted my comments. The original posting to which I had responded did *not* say that EA orthogonality would result in better compiled code. It said that EA orthogonality would allow that compiler writer to save himself the trouble of swapping operands on a compare instruction and logically inverting the branch condition. This does *NOT* improve the performance of the compiled code. In fact, on the NS320xx cpus (the only ones around with 2-address architecture), a "backwards" compare instruction takes an extra 2 clock cycles of execution time. I have no objection to compiler writers who wish to make a case that EA orthogonality will result in better compiled object code. But I object strenuously to the notion that regardless of whether it would benefit or hurt the users, the cpu architecture should be changed to please lazy compiler writers. EA orthogonality should be argued on the basis of the efficiency of the resulting object code, not on the ease with which the handful of compiler writers can do their job. Some of the notes have indicated that these concerns are one and the same. Sometimes, but not always. Here's a choice counter-example: Some RISC machines have a "branch *after* next instruction" operation. This allows the pipeline to be used more efficiently. It results in more efficient object code than conventional branch instructions, but it is a booger-bear to write an effective compiler for. A lot of folks have also suggested that compilers which were easily written (I call them "hastily knocked out" :-) are more bug-free than ones that took some time to implement. I maintain that the quantity of bugs is related to the quantity and quality of design and debugging. Now how much design and debugging do you expect to get from a compiler writer who thinks that putting the operands of a "compare" instruction in the proper order is "too much work"? It is also said that good compilers take longer to produce than crummy ones. True. Are we all so impatient that we'd rather have a crummy compiler now than to wait six months for a good one? And it has been said that good compilers cost more than crummy ones. I'm not exactly surprised. Isn't there an old saw about "only getting what you pay for"? I suggest that part of the problem here is that a lot of folks who are reading this hope to write The Great American Compiler. They weren't planning on spending the time and money to write a good compiler. And they don't much care for hearing suggestions that users don't want to buy crummy compilers. (Have at it, my mailbox is asbestos-lined now). -- Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug ^^^^^--- soon to be CalComp
rap@oliveb.UUCP (Robert A. Pease) (06/05/85)
> > You may do what you want, but I'll stick to hardware that was designed > by software people. > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. The thing that I keep thinking about is that every paper, article, text, or whatever I have seen on the subject says that the best way to design a system is to first decide what the application will be and then design the hardware to support the design goals. Seems to me, then, that an orthogonal archetecture would support high level languages much better than one that is not orthogonal, or do I just see things more clearly that others :-). -- Robert A. Pease {hplabs|zehntel|fortune|ios|tolerant|allegra|tymix}!oliveb!oliven!rap
paul@greipa.UUCP (Paul A. Vixie) (06/05/85)
In article <210@uthub.UUCP> thomson@uthub.UUCP (Brian Thomson) writes: >The C statement > a = b * c + d + e; > >might, in different contexts be implemented on your 32032 as: > movd _b,r0 > muld _c,r0 > addd _d,r0 > addd _e,r0 > movd r0,_a > >or, if c is the constant 2, d a stack local, and e the constant 4, > movd _b,r0 > addr 4(-4(fp))[r0:w],_a > >or even, if b, c, and d are all unsigned shorts, and e == b, > movzwd _b,r0 > indexw r0,_c,_d ; b * (c+1) + d > movd r0,_a Or, how about: ; extern long int a, b, c, d, e; ; a = b * c + d + e; movd ext(_b), tos muld ext(_c), tos addd ext(_d), tos addd ext(_e), tos movd tos, ext(_a) ; extern long int a, b; ; #define c 2 ; auto long int d; ; #define e 4 ; a = b * c + d + e; movd ext(_b), tos muld 2, tos addd 4(fp), tos addd 4, tos movd tos, ext(_a) ; extern long int a; ; extern unsigned short int b, c, d; ; a = b * c + d + b; movzwd ext(_b), tos movzwd ext(_c), tos muld tos, tos movzwd ext(_d), tos addd tos, tos movzwd ext(_b), tos addd tos, tos movd tos, ext(_a) ---------------- The above code is not very pretty nor efficient. In each case I have done five operations: move, multiply, add, add, move. The only real difference is in the addressing modes; this seems common of compiler-generated code. I am no longer (thank <insert deity here>) an expert on the 68xxx, but I don't remember an external or frame-relative addressing mode; one assumes that the many otherwise useless address registers will be used to hold the current global and frame pointers, and the loader has alot of fixing up to do on those globals - every reference needs modification, not just an extern table (unless you plan to have your compiler generate enough low-level stuff to do what the 32xxx external addressing mode does automagically). Not being a compiler writer (yet :-), anyway) I don't see many other things a compiler could optimize for (except the "muld 2, tos" which could have been "ashd 1, tos" but only vax-11 C from DEC does this). I do know that the 68xxx's addressing modes and strange restrictions on address and data registers are more characteristic of RISC than a machine with all those instructions. Can the 68xxx even do a "addd -(sp), (sp)" without doing the pop at the wrong time? The one I worked with didn't have any memory-to-memory instructions; you could do register to memory, memory to register, or register to register, but they were all different instructions (in fact, different instructions for address and data registers, and that's when they felt like providing them - often you had to move into an (address or data) register from a (data or address) register to do a simple operation. Gosh, what a ramble. Sorry about that everybody. My point in all this is that a compiler can generate *clean* code *easily* for the 32xxx because of all the neato addressing modes; generating code for the 68xxx is either (easy, ugly, inefficient) or (hard, functional, efficient) but that's like a choice between the electric chair and the gas chamber. Paul Vixie {pyramid,dual,decwrl}!greipa!paul
mark@rtech.UUCP (Mark Wittenberg) (06/05/85)
> For example: > a += b; > orthogonal: > add b(r5),a(r5) > non-orthogonal: > mov a(r5),r4 <-- AND MAKE SURE IT'S FREE!! > add b(r5),r3 > mov r3,a(r5) > Now, in cycles, the first one would result in 4 memory cylces and > 3 additions, and the second in 6 memory cycles and 4 additions (PLUS > an additional 2 instruction decodes). > > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. And furthermore, the orthogonal sequence is normally atomic; in an OS kernel the non-orthogonal sequence might easily have to be protected by a "disable/enable interrupt" sequence around it, or "test-and-set" or some such in a multi-processor system (e.g., "a" and "b" might be global vars). Multi-process user-programs would need "enter/exit monitor" or "block-on-semaphore" sequences. Besides being a pain (sometimes a royal pain) this has the potential for eating a lot of CPU time. -- Mark Wittenberg Relational Technology zehntel!rtech!mark ucbvax!mtxinu!rtech!mark