kar@ritcv.UUCP (05/07/84)
<> Having collected all of the replies to my posting, it is now time for a reply. First, a disclaimer -- I am not advocating that reduced-instruction- set machines themselves go away. Whether the concept is good or not is a completely different debate. It is the pyramid's boundary alignment require- ment that I object to. I'll answer arguments in more or less the order in which I received the articles here. Henry Spencer (Univ of Toronto Zoology) writes, in part: > Simple hardware wins on more counts than just making life easier for > lazy engineers. It is simpler and cheaper to build, ..., more reliable, > easier to fix when it breaks, etc etc. Yes, simple hardware is easier to design, easier to fix, and all of the rest. But it is a lot less useful. How simple should it be? An abacus is pretty simple, for example. > (omitted from above) simpler for the software to run (comparing an 11/44 to > the nightmare complexity of the VAX i/o structure, for example, tells you > why 4.2BSD is so bloated) You are confusing implementation with interface. The functions you choose to implement do not determine the interface between the hardware and the soft- ware; consider the many different ways that different machines support I/O. On the Vax, it is the complexity of the interface that causes the problems, not underlying implementation. > Don't forget that magic word "cheaper". It has become fashionable > to say "software costs totally dominate hardware costs", but most > people forget to add "...unless you can't afford the hardware in the > first place". Hardware and software money don't usually come out of > the same pot, and the people who make decisions about such things are > not necessarily as enlightened as we are. Then enlighten them! If you buy a machine on which software development is difficult only because that machine is cheaper, you're not making a very good decision. It's up to those who know about such things to educate those that "make the decisions" about the false economy of choosing a machine based only on price. > And once again, don't forget the example of the VAX: sure, it looks > like a nice machine, but it's grossly overpriced for its market now. > This is despite massive use of [semi-]custom ICs on the more recent > VAXen -- and you would not believe what a disaster that is for > production and maintenance! (There is an awful lot to be said for > using standard parts, which means restricting yourself to things that > can be built economically with them.) Are you suggesting that it would have been better to build Vaxen from 7400 series logic? I think not. > I have heard, from reliable sources, that if/when the successor to the VAX > emerges, the biggest difference will be that it will be much simpler. That's the interface again, not necessarily the implementation. > If you can show me a way to eliminate alignment constraints without a > speed penalty, WITHOUT adding large amounts of hardware (which I could > use better to make the aligned version faster), I'd love to hear about > it. It's hard. Page 200 of the Vax Hardware Handbook describes how it is done with a cache on the 780. The same can be (is!) done with data, using a larger cache to compensate for the less sequential nature of data accesses. > But actually, most of this is beside the [original] point. We are not > talking about some decision which makes life a lot harder for the poor > software jockey. We are talking about a decision which requires more > memory to get equivalent performance. There is a perfectly straight- > forward hardware-vs-hardware tradeoff here: is it cheaper to build > a machine that doesn't care about alignment, or to just stack more > memory on a machine that does care? I would give long odds that the > latter approach wins, even just on initial cost. When you think about > things like reliability and maintenance, it wins big. Good point. However, for programs that dynamically allocate space, the size of the largest problem that can be handled is determined by how efficiently you use that space. For ANY given memory size, the program that more efficiently uses the space can handle larger problems. Of greater concern, though, is where all that data resides when the program is not actually running. Can you add additional disk space to hold all of the wasted space in your data structures as cheaply as you can add main memory? I would be upset if I had to buy another drive because all of my existing ones were full of data that was 25% wasted space. Sure, you can pack them on disk and unpack them when you read them, but you are then trading away execution efficiency. > I agree that this doesn't help the poor people who have made a big > investment in data structures that assume no alignment constraints. > These people have made a mistake, period: they have imbedded a major > machine-dependent assumption in software that obviously should have > been portable. This is my whole point -- alignment should NOT be machine dependent. This from Spencer W. Thomas: > A current trend in computer design is to assume that the user will only > be writing in a high-level language, and that the compiler will do the > hard work of generating machine code. This is the theory behind RISC > machines, in particular. Making the hardware simpler makes it run faster. Note: RISC machines simplify the interface to the machine, the machine language. The point of this is to simplify the generation of optimal code. The speed of the machine is determined by the implementation, not the inter- face. > Once we start getting really convoluted machines (such as ELI, or some > pipelined machines which execute several instructions after a > conditional branch, before actually branching), all your clever hacks > based on assumptions about the hardware will just go straight down the > tubes. If the compiler were smart enough, it would say "Oh, he's trying to > access a longword, but it's not on the right boundary", and generate a byte > move instruction to align it before accessing. Huh? If the implementation is allowed to screw up the interface, then the instructions won't be doing what you think they should (e.g. executing several instructions after a conditional branch before actually branching). To over- come this, the compiler would have to be pretty smart. As for automatically checking for whether a byte move is necessary, it's fine for statically allocated structures. For any structure accessed through a pointer, however, a run-time check would be required. Again, we're trading hardware complexity with performance. If you want blinding speed, do it in hardware. > The basic problem is that generality is slower. For really blinding > speed, you always have to give up something. With big memories, > arbitrary alignment is not too hard to give up. (I bet that the > original application never put longwords on odd byte boundaries, now did > it?) The original application DID have "longwords on odd byte boundaries" -- that's what caused the whole discussion. Given that and the discussion above, then the fastest solution is to have the hardware (not the software) take care of non-aligned data. >From mprvaxa!tbray: > His argument is that byte addressability is a win because of the > greater ease in writing software and the high cost today of software > vs hardware. > Not so! Because... > 1. All significant machine code today is generated by compilers, not > by people, and the compilers do the messy work of aligning everything. Only if you're willing to pay the price -- see the discussion of disk space and maximum problem size above. > 2. Removing the byte-addressability constraint allows the hardware boys > to build much more cost-effective architectures, and to build them > quicker. It should be no surprise that less useful hardware is cheaper and faster to build. > 3. Point number 2 is vital since the line about the rising cost of > software with respect to hardware is so much horse puckey. All those > graphs of the future that showed a graph that looked like an X, the > rising line being software and the falling line hardware, never happened. Let's look at the vax again. Compare the cost of developing the hardware to the cost of all of the software that runs on it, and see the complaint above about how the complexity of the vax resulted in a "bloated" 4.2 Unix. > The reason being that the demand grows at a phenomenal rate and every > year software becomes more layered, more functional, and less hardware- > efficient (*see note). Which is as it should be. So quick, cheap > architectures are IMPORTANT. If you're talking about operating systems, then yes. How many operating systems do you know of, though, that provide application-specific function- ality? Until that happens, complex applications systems will remain expen- sive to implement, especially PORTABLE ones. Reducing needless differences between machines will make this simpler. > If somebody can build, say, a Ridge 32 and it runs a really nice UNIX (it > doesn't yet) and goes like hell for < $50K (it does), I'll cheerfully > grapple with nUxi and alignment problems in my existing software. I wouldn't. > As to the reduced machine efficiency of modern software, this was really > brought home to me one time I was touring a DEC manufacturing plant, and > there were these 6 huge backplane-wiring machines running flat out, all > being driven by a little wee PDP-8 (!). When I expressed surprise, the > manager explained that the 8 could do it with room to spare because there > was no operating system to get in the way... So what? This particular application didn't need any of the capabilities provided by a modern operating system, such as multiple users, paging, device independent I/O, networking, etc. And finally, from hhb!bob: > Now let me flame at the folkz who felt compelled to tell me that we had > written the code completely wrong. These responses we just typical > (and as I had expected) of UN*X snob types with little understanding of > what it takes to develop major software systems. With attitudes like > that we ought to just throw most of UN*X out the window. Do you have > any idea how much effort we spent making the UN*X utilities work on a > machine that did not have character pointers the same size as all other > pointers ? (This was for the word addressed machine I had previously > mentioned). It was months, and an extremely tedious job. So obviously > they wrote UN*X wrong PERIOD. Right on. Remember that, prior to 4.2, block addresses in inodes were stored on the disk as 3 bytes. Why? To save space on the disk, that's why! Of course Unix is not "wrong PERIOD" -- it is a REAL software product, the result of compromise and continual change. In conclusion -- the fewer differences there are between machines, the easier it will be to port software. I do not mean to imply that all machines should be identical; it still makes sense to have "small" machines for small applications and large machines for large applications, or machines with special quirks for quirky applications. Within any given group, however, non-essential differences should be eliminated from the architectures! Manufactures that do this will be rewarded with increased sales IF the software engineers educate those who hold the purse strings about the economies of producing software. Ken Reek, Rochester Institute of Technology {allegra,seismo}!rochester!ritcv!kar
dmmartindale@watcgl.UUCP (Dave Martindale) (05/10/84)
To address your last point first, I do not see how you can agree that UNIX was written "wrong" because parts of it assume that all pointers will fit in an int, an then say that your sofware is right and the hardware is wrong for not allowing non-aligned references. If you believe that all hardware should allow non-aligned references for the benefit of the software, why don't you also argue that all hardware should use byte addressing, with all pointers the same length and the pointer the same size as the int? These latter restrictions are actually far more important to porting much code than having the processor do unaligned fetches. Hardware design is a series of tradeoffs. It would be nice to have hardware that would accept data on arbitrary byte boundaries. It would be even nicer to extend that to arbitrary bit boundaries. It would be nice if no machine had an address shorter than 24 bits, 32 would be much better. It would be nice if all machines had floating-point instruction times which are comparable to their integer instruction times. It would be nice to have virtual memory capability on all machines (this makes a much greater difference in "size of problem that can be handled" than the ability to pack data to eliminate wasted space). But all "it would be nice to have" features cost something - in speed, cost, power, size of board, or somewhere. Manufacturers will continue to decide to include or exclude a feature based on these tradeoffs. Computer users will continue to evaluate the machines produced in light of their capabilities, restrictions, and the application they are intended for. Now, if you choose to ignore all machines which will not do unaligned fetches, that is your right. But please do not badmouth other people who see other issues as more important to what they do. And if you really want to call your software "portable", it should run on machines which have different sizes of pointers, different byte orders within the word, different native character sets, and even restrictions on the alignment of data. You make software which is portable by making it independent of the details of the implementations of a wide variety of machines, not by arguing that a certain class of those machines should not be allowed to exist. Dave Martindale
henry@utzoo.UUCP (Henry Spencer) (05/17/84)
Dave Martindale has addressed some of Ken Reek's comments; here's some more rebuttal... > Simple hardware wins on more counts than just making life easier for > lazy engineers. It is simpler and cheaper to build, ..., more reliable, > easier to fix when it breaks, etc etc. Yes, simple hardware is easier to design, easier to fix, and all of the rest. But it is a lot less useful. How simple should it be? An abacus is pretty simple, for example. The whole point of the RISC notion is that the hardware can be made dramatically simpler *without* losing anything important. You haven't demonstrated that "aligned" machines lose anything important -- the inability to run unportable software is hardly significant, or the RISC would be doomed by its inability to run VMS. > (omitted from above) simpler for the software to run (comparing an 11/44 to > the nightmare complexity of the VAX i/o structure, for example, tells you > why 4.2BSD is so bloated) You are confusing implementation with interface. The functions you choose to implement do not determine the interface between the hardware and the soft- ware; consider the many different ways that different machines support I/O. On the Vax, it is the complexity of the interface that causes the problems, not underlying implementation. Complexity of implementation usually rears its ugly head in the interface as well. Even things like caches and unaligned-operand features do show up, if you think about it carefully. Ask someone who's written operating-system memory-management code what he thinks of unaligned operands that can straddle page boundaries. No, unaligned operands are *not* free of software complexities, although the extra complexity they introduce is at least localized. > Don't forget that magic word "cheaper". It has become fashionable > to say "software costs totally dominate hardware costs", but most > people forget to add "...unless you can't afford the hardware in the > first place". Hardware and software money don't usually come out of > the same pot, and the people who make decisions about such things are > not necessarily as enlightened as we are. Then enlighten them! If you buy a machine on which software development is difficult only because that machine is cheaper, you're not making a very good decision. It's up to those who know about such things to educate those that "make the decisions" about the false economy of choosing a machine based only on price. I don't know about you, but my software development would not be one cent cheaper on an unaligned machine. (My current machine, a PDP11, is aligned.) Clean, portable software has no problem with such a machine. > And once again, don't forget the example of the VAX: sure, it looks > like a nice machine, but it's grossly overpriced for its market now. > This is despite massive use of [semi-]custom ICs on the more recent > VAXen -- and you would not believe what a disaster that is for > production and maintenance! (There is an awful lot to be said for > using standard parts, which means restricting yourself to things that > can be built economically with them.) Are you suggesting that it would have been better to build Vaxen from 7400 series logic? I think not. Read that last parenthesized sentence again -- the point is not that you should use standard parts even for jobs that they can't do, but that you should restrict your jobs to things that standard parts *can* do. It actually is possible to implement a VAX in 7400's -- what do you think the 780 is made of? -- it's just hard and expensive. > I have heard, from reliable sources, that if/when the successor to the VAX > emerges, the biggest difference will be that it will be much simpler. That's the interface again, not necessarily the implementation. See earlier for comments on the visibility of implementation complexities. > If you can show me a way to eliminate alignment constraints without a > speed penalty, WITHOUT adding large amounts of hardware (which I could > use better to make the aligned version faster), I'd love to hear about > it. It's hard. Page 200 of the Vax Hardware Handbook describes how it is done with a cache on the 780. The same can be (is!) done with data, using a larger cache to compensate for the less sequential nature of data accesses. Please read your Vax Hardware Handbook more carefully. The same is *not* done with data, unless the data item fortuitously happens to fall within an 8-byte-aligned doubleword. A data item that straddles an 8-byte boundary in memory will take two fetches, i.e. a speed penalty. Nor is the presence of the cache an "out": caches cost hardware. Lots. The VAX has already paid this particular price for other reasons -- stupid memory-system interface design -- but this doesn't make the cache free. Well-designed machines in this speed range don't need caches at all. > But actually, most of this is beside the [original] point. We are not > talking about some decision which makes life a lot harder for the poor > software jockey. We are talking about a decision which requires more > memory to get equivalent performance. There is a perfectly straight- > forward hardware-vs-hardware tradeoff here: is it cheaper to build > a machine that doesn't care about alignment, or to just stack more > memory on a machine that does care? I would give long odds that the > latter approach wins, even just on initial cost. When you think about > things like reliability and maintenance, it wins big. Good point. However, for programs that dynamically allocate space, the size of the largest problem that can be handled is determined by how efficiently you use that space. For ANY given memory size, the program that more efficiently uses the space can handle larger problems. Of greater concern, though, is where all that data resides when the program is not actually running. Can you add additional disk space to hold all of the wasted space in your data structures as cheaply as you can add main memory? I would be upset if I had to buy another drive because all of my existing ones were full of data that was 25% wasted space. Sure, you can pack them on disk and unpack them when you read them, but you are then trading away execution efficiency. You have not refuted my point at all. Granted that adding memory (be it main memory or disk) costs money, it's still cheaper and simpler than adding unaligned-operand hardware. > I agree that this doesn't help the poor people who have made a big > investment in data structures that assume no alignment constraints. > These people have made a mistake, period: they have imbedded a major > machine-dependent assumption in software that obviously should have > been portable. This is my whole point -- alignment should NOT be machine dependent. There is an anecdote attributed to Abraham Lincoln. He asked a riddle: "If you call a dog's tail a leg, how many legs does a dog have?". The correct answer is: "Four. Calling the tail a leg doesn't make it one." The fact is, alignment *IS* machine dependent, and all the wishing in the world won't change it. To quote Dave Martindale: .......................... You make software which is portable by making it independent of the details of the implementations of a wide variety of machines, not by arguing that a certain class of those machines should not be allowed to exist. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry