doug@terak.UUCP (Doug Pardee) (05/16/85)
**** WARNING **** The following comments are not as nice as etiquette recommends. > I think total orthogonality would be *very* useful. > ... > A 68K compiler has to think about modifying the branch condition, etc. > A 32K compiler just generates code in the way it sees the statement. > > Of course, an optimizer might throw everything around again > to save registers or whatever, but the inital code generation is > much simpler in the 32K case. What in heck do you think we users are paying you compiler writers to DO? The purpose of a CPU is to solve the *user's* application as quickly as possible. The purpose of a CPU is *NOT* to be as easy to write a compiler for as possible. Why on earth should the design of a CPU be based on how easy it will make the jobs of the five people who will write the compilers for it? -- Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug ^^^^^--- soon to be CalComp
henry@utzoo.UUCP (Henry Spencer) (05/19/85)
> What in heck do you think we users are paying you compiler writers to > DO? How much do you want to pay us? A lot, or not so much? You could not possibly pay me enough money to get me to implement a compiler for some of the scummier machines around, unless your wallet is a lot bigger than one would expect. > The purpose of a CPU is to solve the *user's* application as quickly as > possible. > > The purpose of a CPU is *NOT* to be as easy to write a compiler for as > possible. Have you considered that the two may be related? Difficult compilation generally means poorer compilers, i.e. poorer performance for the user. > Why on earth should the design of a CPU be based on how easy it will > make the jobs of the five people who will write the compilers for it? Because it will result in faster and more reliable compilers that produce better code and better error messages. If you have tried to hire good compiler people lately, you know that compiler-writer time is neither cheap nor in infinite supply. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
cdshaw@watmum.UUCP (Chris Shaw) (05/20/85)
| What in heck do you think we users are paying you compiler writers to | DO? | | The purpose of a CPU is *NOT* to be as easy to write a compiler for as | possible. | | Why on earth should the design of a CPU be based on how easy it will | make the jobs of the five people who will write the compilers for it? | -- | Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug | ^^^^^--- soon to be CalComp All right... then who should the opcodes be designed FOR, on your opinion... COBOL programmers? FORTRAN programmers ? I'm sure you've done assembler on an awful machine in your time.. I know I have assembled for the worst commercially available processors in existence. The ultimate lesson to be gained from such machines as the 1802, 6502 and Z80 is that orthogonality gains you a very noticeable productivity improvement when coding in assembler. When one is writing a program that codes in assembler (i.e., a compiler), orthogonality is a humungous win, because you don't have to code register-use weirdnesses into your compiler. You don't have to worry about what kind of expression you're evaluating when you produce code to do it, etc, etc. The end result is that the compiler for a ortho machine is more likely right and is to market faster, all other things being equal. Chris Shaw watmath!watmum!cdshaw University of Waterloo
jack@boring.UUCP (05/20/85)
[ Note that I added net.arch to the newsgroup, since this is probably where this discussion belongs] In article <557@terak.UUCP> doug@terak.UUCP (Doug Pardee) writes: >**** WARNING **** The following comments are not as nice as etiquette >recommends. Agreed. Also, I think they're not true. > >> I think total orthogonality would be *very* useful. >> ... >> A 68K compiler has to think about modifying the branch condition, etc. >> A 32K compiler just generates code in the way it sees the statement. >> >> Of course, an optimizer might throw everything around again >> to save registers or whatever, but the inital code generation is >> much simpler in the 32K case. > >What in heck do you think we users are paying you compiler writers to >DO? > >The purpose of a CPU is to solve the *user's* application as quickly as >possible. Agreed. In my opinion, this means that the CPU should be optimized to doing what most users do most of the time: running high-level language programs. > >The purpose of a CPU is *NOT* to be as easy to write a compiler for as >possible. Not agreed. If a machine is simple, the compiler is simpler, and thus it is available sooner, doesn't have as much bugs, etc. > >Why on earth should the design of a CPU be based on how easy it will >make the jobs of the five people who will write the compilers for it? Because *EVERYONE* will use the product of those five people. If, for instance, a compiler for a certain machine generates lousy code for a for-loop, because the compiler writers didn't have time to optimize it because they were too busy with getting the compiler to *work* that will waste *HOURS* of CPU time eventually for everyone using it. This is also the whole point behind RISC architecture, one of the rising stars at the moment. -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
doug@terak.UUCP (Doug Pardee) (05/22/85)
me>The purpose of a CPU is *NOT* to be as easy to write a compiler for as me>possible. > Not agreed. If a machine is simple, the compiler is simpler, and thus it > is available sooner, doesn't have as much bugs, etc. Did I miss something here? Since when is it any concern of mine, as a user, whether the compiler is simple??? And I have seen no evidence that compilers for "simple" machines are available any sooner, or are any more reliable, than compilers for warpo machines. me>Why on earth should the design of a CPU be based on how easy it will me>make the jobs of the five people who will write the compilers for it? One response: > Because *EVERYONE* will use the product of those five people. But that doesn't address the question as to why the comfort and convenience of those five people is of any concern to "*EVERYONE*". Another response: > If you have tried to hire good > compiler people lately, you know that compiler-writer time is neither > cheap nor in infinite supply. Ah, here we finally get to the nitty-gritty. What we're saying is that we want to have CPUs that are easy to write compilers for so that we can hire less-capable (aka *cheaper*) programmers to write the compilers!!! Given how few micro-processor instruction sets there are, and how few languages of interest, you don't *need* an "infinite supply" of compiler programmers. In fact, about a dozen could do the job for the entire microcomputer world. There are certainly a dozen top-notch compiler programmers available for this task. And given the importance of having good compilers, they're worth whatever they get paid. But CPUs and compilers are put out by IC manufacturers, and they understand chips better than software. So they tend to put their money into design work on the chip, and hire cheap programming labor to produce less-than-thrilling compilers. Since the manufacturers' compilers are often poor, third-party operations spring up all over the place to try to cash in. Typically underfinanced, these operations *also* hire cheap programming labor and produce less-than-thrilling compilers. And the vacuum remains, so even more third-party start-ups appear. For heaven's sake, how many C compilers do we have to develop for the 68000 before we get one that's good??? Wouldn't it have been a whole lot easier if Motorola or Microsoft or *someone* had put up the bucks necessary to hire real compiler writers in the first place? I think it makes more sense to take compiler-writing seriously, rather than try to kludge the CPU so that every basement hacker can write what he calls a "compiler". -- Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug ^^^^^--- soon to be CalComp
jbn@wdl1.UUCP (05/22/85)
The idea is to make programs go fast. This requires a machine for which a a compiler can generate fast code. This is quite different from a machine for which it is easy to generate code. One of the easiest architectures for which to generate code is the true stack machine, where all operands are pushed on the stack and all operators take data from the stack and return it to the stack. The code for such machines is reverse Polish notation, such as HP calculators use. USCD Pascal P-code is the best known modern ``machine'' that works this way, but many hardware machines, starting with the English Electric Leo Marconi KDF9 in 1959, and many Burroughs machines from 1960 on, worked this way. The compilers are trivial. But you can't optimize effectively for a true stack machine. Nor can the machine overlap or pipeline operations effectively. Because all operations implicitly refer to the top of the stack, the independence of operations needed for pipelining is very difficult if not impossible to achieve. Pipelined machines typically have many registers; the instruction fetch/decode unit can then keep grabbing instructions and shipping them off to the functional units for execution until blocked by a reference to a register tied up by an operation in progress. The CDC6600 and IBM 7030 (STRETCH) circa 1965 were the first machines that worked this way, and the newer microprocessors are starting to use this technology. The Stanford MIPS machine does work this way, but lacks the hardware interlocks (called the ``scoreboard'' in the CDC6600) to cause instruction fetch/decode to block when a register conflict is detected; the compiler for the MIPS machine has to stick in no-op instructions if look-ahead would cause a register conflict. What the CPU designer concerned with speed really needs is a good background in optimizing compiler technology and some knowledge of the history of CPU architecture. John Nagle
kds@intelca.UUCP (Ken Shoemaker) (05/23/85)
> > [ Note that I added net.arch to the newsgroup, since this is probably > where this discussion belongs] > > > > >The purpose of a CPU is *NOT* to be as easy to write a compiler for as > >possible. > Not agreed. If a machine is simple, the compiler is simpler, and thus it > is available sooner, doesn't have as much bugs, etc. > > This is also the whole point behind RISC architecture, one of the > rising stars at the moment. > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. Not entirely true. - the only instructions that can access memory are mov (or load) operations - jumps jump only after the instruction after the jump has been executed - some don't have hardware interlocks to prevent a register being read before a previous register write has completed, so you have to remember to do enough in between so you don't have problems. - they don't allow arbitrary byte boundaries for code/data You can argue that this is merely code reorganization, but they are implemented this way such that you can eliminate both hardware pipeline stages, and the delays in each stage that is there. Just my impressions... -- It looks so easy, but looks sometimes deceive... Ken Shoemaker, Intel, Santa Clara, Ca. {pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds ---the above views are personal. They may not represent those of Intel.
jack@boring.UUCP (05/25/85)
I'm not sure whether Doug Pardee is serious, or just trying to keep the discussion going. I'll assume he *is* serious, and answer him anyway. Doug>The purpose of a CPU is *NOT* to be as easy to write a compiler for as Doug>possible. me> Not agreed. If a machine is simple, the compiler is simpler, and thus it me> is available sooner, doesn't have as much bugs, etc. Doug>Did I miss something here? Since when is it any concern of mine, as a Doug>user, whether the compiler is simple??? Exactly for the reasons stated above. You don't want to argue that a 4000 line compiler is easier to maintain, debug, etc. than one of 8000 lines, I hope? Doug>And I have seen no evidence that compilers for "simple" machines are Doug>available any sooner, or are any more reliable, than compilers for Doug>warpo machines. No? Get yourself a PR1ME, and try the pascal compiler :-( Now, I won't comment on the rest point-by-point, since it would be too long-winded that way. Let me just explain the following point: When you are designing a machine, you are facing two size problems: 1. How do I fit all those transistors on this little square? 2. How do I fit all those opcodes in those 16 bits? An orthogonal design is clearly good for (1), since it allows you to put use the hardware (or firmware) for calculating "x'100(a6:B)[d0]" many times. Now, to satisfy (2), you can do two things: - Make the operand fields small, so you can have many opcodes. - Make the opcode fields small, so you can have complicated operands. (I won't go into RISC here, which makes both of them small). If you take the first choice, you can have lots of nifty instructions like 'search for a one bit, and return the position in a register' or 'copy a string and translate' and those kind of things, which will *never* be used by *any* compiler (except for cobol, maybe) since most high-level language don't have a construct for that. Can you imagine a compiler that would recognize for(p=src, q=dst; *p; p++, q++) *q = table[*p]; and translate it in the above mentioned instruction? If you take the second branch, you will *not* have a string translate instruction. You will, however, have the ability to make your design orthogonal. Wirth (I think, I'm not sure) has long ago measured that the average expression had 1.5 operands. This means the half of the instructions you give will be expressible in *one* instruction, provided that the machine lets you address something on the stack as an operand. For example: a += b; orthogonal: add b(r5),a(r5) non-orthogonal: mov a(r5),r4 <-- AND MAKE SURE IT'S FREE!! add b(r5),r3 mov r3,a(r5) Now, in cycles, the first one would result in 4 memory cylces and 3 additions, and the second in 6 memory cycles and 4 additions (PLUS an additional 2 instruction decodes). Well, this has got long-winded after all, sorry for that. You may do what you want, but I'll stick to hardware that was designed by software people. -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
johnl@ima.UUCP (05/27/85)
While we're all stomping on doug@terak, here's another little reason why ugly architectures are of concern to other than compiler writers: You don't implement a language in a vacuum, and when your'e dealing with a really ugly chip like tht 80?86 series, the language being compiled often gets bent so that the compiler writer can finish his job in a finite amount of time. Every C compiler ever written for the '86 series has ended up having several code "models" which do their data and addressing in various ways that trade off size of usable address space vs. compactness and speed of object code. Some compilers have even added "near" and "far" pointer declarations so that the user can give advice to the compiler about how to handle dereferencing each pointer. This means that every compiler user who wants to compile usefully large programs and still have them run fast has to learn more than he ever wanted about the strange warts of the '86s segmented addressing. I deal with exactly this problem every day (and I promise, I'm not writing compilers) and it's getting awfully irritating. Some may find this morally indefensible, but if you put all of the compiler experts in the world into a room, they still couldn't find a way to generate decent code for an '86 that appeared to have a linear address space like C code most naturally wants. So as has been said before, the 8086 and 286 are fine for high-performance vending machines, but for real computing, please, give us anything else. Clever compilers can't paper over this yawning chasm. John Levine, ima!johnl
cdshaw@watmum.UUCP (Chris Shaw) (05/27/85)
>The purpose of a CPU is *NOT* to be as easy to write a compiler for as >possible. > All right then, what IS the purpose of a CPU?? It would seem to me that the purpose of a CPU is to run programs. The purpose of a well-designed instruction set is to make it as easy to program as possible without sacri- ficing performance. Now, it also seems to me that an intelligent CPU design takes into account the types of programs that will run on it. Thus, it's obvious that the 8035 was never designed to be anything more than a controller. When designing the 32032 then, the kind of programs the designers of the chip had in mind were those that would be created by high-level languages. Thus, they made the instruction set as easy as possible to write compilers for. Obviously, orthogonality doesn't matter quite so much on a controller, where the programmer is a human, not a program. On a general-purpose CPU, however, most programs will be created by programs (compilers), so it makes sense to tailor the instruction set to its intended programmers. Anybody who has written a compiler will tell you that ortho machines are easier to write compilers for. It's a simple fact that has been true since day 1. The benefits of programs that are easy to write vs hard to write are as follows: 1) Productivity of the programmer is much higher. Despite Mr. Trissel's comments, compiler writers are harder to come by than (say) COBOL programmers, and are therfore more expensive. Simply asking for better programmers doesn't solve this problem. Therefore, the more productive your programmers, the better. Of course, if the market for 8035 C compilers is twice that for 68000 C compilers, then maybe start writing 8035 stuff, but that's another matter entirely. 2) Program correctness (lack of compiler bugs). All things being equal (which they aren't), a compiler for a weird machine produced from N man-months of labour will be generally less right than that for an ortho machine. This point is really an outgrowth of productivity. Almost as importantly, an ortho compiler will be easier to maintain and fix bugs for than for a non-ortho machine, since there is no complicated register-assignment algorithm, etc... 3) Object code speed. Given that a CPUs x and y have the same hardware, but different instruction sets (2 microcode sets, say), compiler code produced for the ortho version is most likely going to be faster, since special-purpose register decisions are not reflected in the code. In other words, non-orthogonality generates superfluous moves that would probably not be necessary in an ortho machine. This point is true whether the code is compiler or human produced. The lack of a general reg-to-reg add on the Z80 is cause for much wasted reg-to-reg MOVs, (or worse) reg-to-memory MOVs, for example. >I think it makes more sense to take compiler-writing seriously, rather >than try to kludge the CPU so that every basement hacker can write what >he calls a "compiler". >-- >Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug I think this point is ripe nonsense. The bit which grabs me worse, of course, is the twisted use of the word "kludge". And as for this garbage about basement hackers, well.... (I guess it's time to go upstairs for a beer & mellow out :-) Chris Shaw watmath!watmum!cdshaw or cdshaw@watmath University of Waterloo In doubt? Eat hot high-speed death -- the experts' choice in gastric vileness !
g-frank@gumby.UUCP (05/28/85)
> Every C compiler ever written for the '86 series has ended up > having several code "models" which do their data and addressing in various > ways that trade off size of usable address space vs. compactness and speed > of object code. > > . . . if you put all of the compiler > experts in the world into a room, they still couldn't find a way to generate > decent code for an '86 that appeared to have a linear address space like C > code most naturally wants. > > So as has been said before, the 8086 and 286 are fine for high-performance > vending machines, but for real computing, please, give us anything else. > Clever compilers can't paper over this yawning chasm. > > John Levine, ima!johnl Clever compilers for almost any language but C can paper over most sorts of yawning chasms. The 8086 series is not the first processor without a large linear address space, and it won't be the last. The problem is that C is a programming language written with a particular machine storage model in mind, and it ports poorly to other architectures. Modula-2, Pascal, Ada, all are languages that port quite well to the 8086 family, and produce efficient, readable code without any sort of trickery required of the programmer. The problem is C, not Intel. If you have programs that require enormous data arrays, you picked the wrong processor, didn't you? Otherwise, you just picked the wrong language. Do try to desist from characterizing particular processors as being "suitable for vending machines," by the way. I have a stupid 68000 system in my basement that I can't use and can't sell because there's no software for it, and one of those vending machines sitting on my desk. -- Dan Frank Q: What's the difference between an Apple MacIntosh and an Etch-A-Sketch? A: You don't have to shake the Mac to clear the screen.
johnl@ima.UUCP (05/29/85)
Let's continue this argument in net.arch, which is more appropriate. John Levine, ima!johnl
wjafyfe@watmath.UUCP (Andy Fyfe) (05/29/85)
In article <387@gumby.UUCP> g-frank@gumby.UUCP writes: > The problem is C, not Intel. If you have programs that require enormous >data arrays, you picked the wrong processor, didn't you? Otherwise, you just >picked the wrong language. The problem isn't just `C'. Write a Fortran subroutine that has variable bounds and the Intel Fortran compiler, not being able to bound the array, will assume the worst (generating very scary code, particularly if it's a multidimensional array). For numerical work the above is very likely ---- a pity, given that the 8086 family has had floating point hardware for so long. --Andy Fyfe ...!{decvax, allegra, ihnp4, et. al}!watmath!wjafyfe wjafyfe@waterloo.csnet
thomson@uthub.UUCP (Brian Thomson) (05/29/85)
Chris Shaw writes about orthogonality: >When designing >the 32032 then, the kind of programs the designers of the chip had in mind >were those that would be created by high-level languages. Thus, they made the >instruction set as easy as possible to write compilers for. >On a general-purpose CPU, ... most programs will >be created by programs (compilers), so it makes sense to tailor the instruction >set to its intended programmers. In my experience, the difficulty of (decent) compiler construction is affected less by orthogonality than by the number of code sequences that must be considered when implementing a given source language construct. The C statement a = b * c + d + e; might, in different contexts be implemented on your 32032 as: movd _b,r0 muld _c,r0 addd _d,r0 addd _e,r0 movd r0,_a or, if c is the constant 2, d a stack local, and e the constant 4, movd _b,r0 addr 4(-4(fp))[r0:w],_a or even, if b, c, and d are all unsigned shorts, and e == b, movzwd _b,r0 indexw r0,_c,_d ; b * (c+1) + d movd r0,_a Does that last one look ridiculous? That's exactly my point: it's the best code sequence under the given set of assumptions, and no compiler would ever find it. If these fancy addressing modes and high-level language oriented instructions could be added without penalizing the performance of bread-and-butter instructions, I'd be all for it, but such is never the case. If a machine forces me to put something in a data register before I can add to it, and has no exceptions to this rule, it will be easy to generate code. It only gets tough when there are options. -- Brian Thomson, CSRI Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson
seth@megad.UUCP (Seth H Zirin) (05/30/85)
> Dan Frank writes: > > Clever compilers for almost any language but C can paper over most sorts > of yawning chasms. > The problem is that C is a programming language written with a particular > machine storage model in mind, and it ports poorly to other architectures. C ports well to any large linear address space, stack-oriented processor. In addition, it ports well to small linear address space processors like the 6809. I've used it on IBM's, Univac-1100's, VAXen, and all of the 680x0 processors. > The problem is C, not Intel. If you have programs that require enormous > data arrays, you picked the wrong processor, didn't you? Otherwise, you just > picked the wrong language. WRONG! C lets you exploit a machine's strengths at the expense of not hiding the weaknesses. If you've picked intel for C, you've picked the wrong processor, the 68000 doesn't need to sweep its weaknesses under a high-level language rug. C does however work nicely with intel PROMS. > I have a stupid 68000 system in my basement that I can't use and can't sell > because there's no software for it, and one of those vending machines > sitting on my desk. If you can't use a 68000 based machine, you're probably in the wrong field. On the topic of selling it, send me EMAIL with your asking price. -- ------------------------------------------------------------------------------- Name: Seth H Zirin UUCP: {decvax, ihnp4}!philabs!sbcs!megad!seth Keeper of the News for megad
guy@sun.uucp (Guy Harris) (06/01/85)
> The problem is that C is a programming language written with a particular > machine storage model in mind, and it ports poorly to other architectures. > Modula-2, Pascal, Ada, all are languages that port quite well to the 8086 > family, and produce efficient, readable code without any sort of trickery > required of the programmer. Umm... Pascal has pointers, "new", and "dispose", just like C has pointers and its library has "malloc" and "free". How would you write a program that, say, manipulated large trees requiring >64KB worth of node storage in Pascal? Probably similarly to how you'd write it in C. Now imagine that program dealing with two nodes at the same time. Well, you load one pointer into the DS register and one of the general-purpose registers, and the other one into the ES register and one of the general purpose registers. You have to kludge a bit with the "use the ES register" prefix, but the advance information data sheet I have in front of me doesn't list any times for the segment-selection prefix so I assume it takes no time. Now imagine that program dealing with three nodes at the same time - say it's an expression tree, and it's evaluating an addition by adding the LHS to the RHS and storing the result in the parent node. Well, you load one pointer into the DS register and one of the general purpose registers, the second one into the ES register and one of the general purpose registers, and the third one into the EES register and one of the general purpose registers... Oops. There *is* no EES register. Oh well, shuffle shuffle shuffle.... (If you can get good code for this one, try something where you have to use each pointer more than once.) The same data sheet says a direct intersegment call takes 19 more clocks than a direct within segment call in protected mode, and even in real address mode it takes 6 more clocks. Load pointer to DS/ES instructions take 21 clocks in protected mode and 7 in real address mode. Using those segment registers dynamically rather than statically is not cheap. Do you have evidence that compilers for the other languages you mention can solve problems like this much better than compilers of equivalent quality for C? If so, can you show that the difference is due to some characteristic of the languages in question? Guy Harris
doug@terak.UUCP (Doug Pardee) (06/03/85)
Wait a second! It looks like I should have used one of my "patented" 200-line postings, because an awful lot of people have misinterpreted my comments. The original posting to which I had responded did *not* say that EA orthogonality would result in better compiled code. It said that EA orthogonality would allow that compiler writer to save himself the trouble of swapping operands on a compare instruction and logically inverting the branch condition. This does *NOT* improve the performance of the compiled code. In fact, on the NS320xx cpus (the only ones around with 2-address architecture), a "backwards" compare instruction takes an extra 2 clock cycles of execution time. I have no objection to compiler writers who wish to make a case that EA orthogonality will result in better compiled object code. But I object strenuously to the notion that regardless of whether it would benefit or hurt the users, the cpu architecture should be changed to please lazy compiler writers. EA orthogonality should be argued on the basis of the efficiency of the resulting object code, not on the ease with which the handful of compiler writers can do their job. Some of the notes have indicated that these concerns are one and the same. Sometimes, but not always. Here's a choice counter-example: Some RISC machines have a "branch *after* next instruction" operation. This allows the pipeline to be used more efficiently. It results in more efficient object code than conventional branch instructions, but it is a booger-bear to write an effective compiler for. A lot of folks have also suggested that compilers which were easily written (I call them "hastily knocked out" :-) are more bug-free than ones that took some time to implement. I maintain that the quantity of bugs is related to the quantity and quality of design and debugging. Now how much design and debugging do you expect to get from a compiler writer who thinks that putting the operands of a "compare" instruction in the proper order is "too much work"? It is also said that good compilers take longer to produce than crummy ones. True. Are we all so impatient that we'd rather have a crummy compiler now than to wait six months for a good one? And it has been said that good compilers cost more than crummy ones. I'm not exactly surprised. Isn't there an old saw about "only getting what you pay for"? I suggest that part of the problem here is that a lot of folks who are reading this hope to write The Great American Compiler. They weren't planning on spending the time and money to write a good compiler. And they don't much care for hearing suggestions that users don't want to buy crummy compilers. (Have at it, my mailbox is asbestos-lined now). -- Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug ^^^^^--- soon to be CalComp
rap@oliveb.UUCP (Robert A. Pease) (06/05/85)
> > You may do what you want, but I'll stick to hardware that was designed > by software people. > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. The thing that I keep thinking about is that every paper, article, text, or whatever I have seen on the subject says that the best way to design a system is to first decide what the application will be and then design the hardware to support the design goals. Seems to me, then, that an orthogonal archetecture would support high level languages much better than one that is not orthogonal, or do I just see things more clearly that others :-). -- Robert A. Pease {hplabs|zehntel|fortune|ios|tolerant|allegra|tymix}!oliveb!oliven!rap
paul@greipa.UUCP (Paul A. Vixie) (06/05/85)
In article <210@uthub.UUCP> thomson@uthub.UUCP (Brian Thomson) writes: >The C statement > a = b * c + d + e; > >might, in different contexts be implemented on your 32032 as: > movd _b,r0 > muld _c,r0 > addd _d,r0 > addd _e,r0 > movd r0,_a > >or, if c is the constant 2, d a stack local, and e the constant 4, > movd _b,r0 > addr 4(-4(fp))[r0:w],_a > >or even, if b, c, and d are all unsigned shorts, and e == b, > movzwd _b,r0 > indexw r0,_c,_d ; b * (c+1) + d > movd r0,_a Or, how about: ; extern long int a, b, c, d, e; ; a = b * c + d + e; movd ext(_b), tos muld ext(_c), tos addd ext(_d), tos addd ext(_e), tos movd tos, ext(_a) ; extern long int a, b; ; #define c 2 ; auto long int d; ; #define e 4 ; a = b * c + d + e; movd ext(_b), tos muld 2, tos addd 4(fp), tos addd 4, tos movd tos, ext(_a) ; extern long int a; ; extern unsigned short int b, c, d; ; a = b * c + d + b; movzwd ext(_b), tos movzwd ext(_c), tos muld tos, tos movzwd ext(_d), tos addd tos, tos movzwd ext(_b), tos addd tos, tos movd tos, ext(_a) ---------------- The above code is not very pretty nor efficient. In each case I have done five operations: move, multiply, add, add, move. The only real difference is in the addressing modes; this seems common of compiler-generated code. I am no longer (thank <insert deity here>) an expert on the 68xxx, but I don't remember an external or frame-relative addressing mode; one assumes that the many otherwise useless address registers will be used to hold the current global and frame pointers, and the loader has alot of fixing up to do on those globals - every reference needs modification, not just an extern table (unless you plan to have your compiler generate enough low-level stuff to do what the 32xxx external addressing mode does automagically). Not being a compiler writer (yet :-), anyway) I don't see many other things a compiler could optimize for (except the "muld 2, tos" which could have been "ashd 1, tos" but only vax-11 C from DEC does this). I do know that the 68xxx's addressing modes and strange restrictions on address and data registers are more characteristic of RISC than a machine with all those instructions. Can the 68xxx even do a "addd -(sp), (sp)" without doing the pop at the wrong time? The one I worked with didn't have any memory-to-memory instructions; you could do register to memory, memory to register, or register to register, but they were all different instructions (in fact, different instructions for address and data registers, and that's when they felt like providing them - often you had to move into an (address or data) register from a (data or address) register to do a simple operation. Gosh, what a ramble. Sorry about that everybody. My point in all this is that a compiler can generate *clean* code *easily* for the 32xxx because of all the neato addressing modes; generating code for the 68xxx is either (easy, ugly, inefficient) or (hard, functional, efficient) but that's like a choice between the electric chair and the gas chamber. Paul Vixie {pyramid,dual,decwrl}!greipa!paul
mark@rtech.UUCP (Mark Wittenberg) (06/05/85)
> For example: > a += b; > orthogonal: > add b(r5),a(r5) > non-orthogonal: > mov a(r5),r4 <-- AND MAKE SURE IT'S FREE!! > add b(r5),r3 > mov r3,a(r5) > Now, in cycles, the first one would result in 4 memory cylces and > 3 additions, and the second in 6 memory cycles and 4 additions (PLUS > an additional 2 instruction decodes). > > -- > Jack Jansen, jack@mcvax.UUCP > The shell is my oyster. And furthermore, the orthogonal sequence is normally atomic; in an OS kernel the non-orthogonal sequence might easily have to be protected by a "disable/enable interrupt" sequence around it, or "test-and-set" or some such in a multi-processor system (e.g., "a" and "b" might be global vars). Multi-process user-programs would need "enter/exit monitor" or "block-on-semaphore" sequences. Besides being a pain (sometimes a royal pain) this has the potential for eating a lot of CPU time. -- Mark Wittenberg Relational Technology zehntel!rtech!mark ucbvax!mtxinu!rtech!mark