guy@rlgvax.UUCP (Guy Harris) (04/21/84)
> We currently have a pyramid 90x which I am evaluating in my copious spare > time. It's configured with 4m main, and ~900m disk. We do not have > data cache installed yet. So far, a great deal of fun has been had, but > the 90x faults on non-long-aligned 32-bit memory operations. Any comments, > questions, opinions, etc, would be of interest. Especially regarding the > above item. Also, I am interested in exchanging benchmark stats (vs > 780, 750). Lastly, I am curious about up-and-coming architectures (exp. > DEC) with the above constraint. The Motorola M68000 CPU chips (MC68000 and MC68010, at least) fault on non-word- aligned 16-bit or 32-bit memory operations. The 360 would also fault on non- long-aligned 32-bit operations or non-(half)word-aligned 16-bit operations; the 370 (and 360/85) wouldn't, *but* would run slower if the operand wasn't properly aligned. I believe the same is true on the VAX; i.e., you can put things off the right boundary, but you pay for it. Since C puts things on the right boundary, and since you're better off putting them there in a lot of cases even on machines which permit you to do otherwise, I presume Pyramid figured it wasn't worth the trouble to permit unaligned operands. I suspect it'll be a cold day in June before any VAXen refuse to support unaligned operands (compatibility and all that), but I suspect any other machines DEC is working on (like some RISC project supposedly being cooked up) may not feel any obligation to support them. I can't speak for most other VAX-class superminis (such as the Ridge 32), but there *is* at least one other 32-bit supermini which requires proper alignment; the CCI Power 6/32. (It's not announced yet, but it's *very* fast. Keep your eyes peeled. - unpaid commercial announcement) When we ported UNET to our 6/32, we did get bit by that one but it was simple to fix. (The code was using an "short *" to point into a buffer and either casting or assigning that pointer to "long *" and pulling a 32-bit quantity out of that buffer. It probably never failed on other machines because most of them are 16-bit or 16/32 bit machines and only require 16-bit alignment, or are 32-bit machines but don't require any alignment.) I suspect we took the same tack; namely, C puts things on the right boundary, and there's no guarantee that unaligned operands work on all machines, so portable code won't assume that they do - so there's no point in putting effort into supporting unaligned operands. Consider that lots of machines read the contents of a memory buffer register onto some internal 32-bit data bus; if the operand is aligned, you need not monkey with it, but if it's unaligned you have to fetch the next longword, combine the two longwords, shift it by the appropriate number of bytes, and pull out the appropriate 32 bits and put them onto the appropriate data bus. If you don't really need all this fanciness, why bother? Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
bob@hhb.UUCP (04/23/84)
My company created and now sells the CADAT digital simulation system which incorporates hierarchical concurrent fault simulation capabilities. The software package represents at least 20 man years of effort and is not trivial in any respect. The problem is this: It was originally written in C and was first implemented for a 16 bit architecture. Since we are dealing with such vast amounts of information to describe the topology of a circuit, we went to great pains to make our data-structures as lean as possible. Of course this meant that many of our data structures were `hand' built via pointer arithmetic and such. A typical example is a data structure we use to describe one device in a circuit. (i.e. and,nand,nor,jk-flop....) It has a fixed header but the data that follows it is variable in length, with its structure being derivable from the information in the header. If we went to using C structures to define this information, the padding done by the compiler would increase each device entries size by 25%. There can be million of devices in a single circuit. When we are simulating VLSI components, we often have 'device tables' which have sizes of 2-3 meg. This 25% increase in space could have a serious impact on our performance if we then started being paged by the system. In digital simulation, performance is the name of the game cause Test engineers just aren't cheap. I know, memory is `cheap', but no matter how much memory i have, this simulation system will want to use it, and paying a 25% penalty for `structured programming' is hard to stomach. We were doing fine..... The system now runs under 12 different version of UNIX on both 68000's and Vaxen. It also runs under VMS. We hit our first snag on the Pyramid system we are evaluating. It compiled fine.. but you know the rest..... We `core dumped' when we accessed a long word on a non quad byte boundary. We also found that Ridge will exhibit the same behavior as the Pyramid. Along with the CCI Power 6/32. We now have a dilemma. Spend 10 man-months and rewrite the data structures we have for these 3 machines and incur the 25% memory penalty ? Or just avoid architectures of this type ? How many architectures of the future will have this restraint ? A machine we are looking at now for a CADAT implementation is the ELEXSI. It is a super-mini with IBM mainframe performance but it offers Vaxen like addressing and won't pose any problem to us. What are the opinions in net-land.. Should we re-write or just ignore the (few ?) machines with this restraint ? ======================================= Be
kar@ritcv.UUCP (Kenneth A. Reek) (04/24/84)
Regarding the question of whether to invest 10 man-years to rewrite a large system to make it run on machines that require boundary alignment of data or to just ignore those machines... Ignore those machines and maybe they'll go away. Architectures that require aligning multi-byte quantities on particular boundaries sound to me like they were designed by an engineer who was interested in simplifying his own task at the expense of the software designers who will use the machine. The computer is designed once, but there are an infinite number of programs that might be run on it. This short-sighted engineering was OK for the early 360's, but is not appropriate for modern computers, and if we software types tolerate it, it is likely to keep happening. Ergo: don't buy any of these machines, and maybe people will stop making them. Ken Reek, Rochester Institute of Technology {allegra,seismo}!rochester!ritcv!kar PS: Allowing non-aligned data to be accessed with a performance penalty is only a little less short-sighted. When designing a computer, anything that makes the job of writing software easier will be justified in the marketplace, especially the oem marketplace. Given the current comparison of hardware costs to software (i.e. people) costs, a more expensive cpu that is easier to program will be vastly cheaper in the long run.
chris@umcp-cs.UUCP (04/25/84)
Simple solution: write a subroutine to access longwords that might not be aligned. Since subroutine calls are so fast on the Pyramid, you'll probably still run faster than a Vax.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
stan@clyde.UUCP (04/25/84)
Sounds like it should have been a compiler option to have the data strunctures aligned or not. (I realize that I'm RISC-king my neck by saying that, though.) Stan King phone: 201-386-7433 Bell Labs, Whippany, NJ Cornet: 8+232-7433 room 2A-111 uucp: clyde!stan
dmmartindale@watcgl.UUCP (Dave Martindale) (04/26/84)
PS: Allowing non-aligned data to be accessed with a performance penalty is only a little less short-sighted. When designing a computer, anything that makes the job of writing software easier will be justified in the marketplace, especially the oem marketplace. Given the current comparison of hardware costs to software (i.e. people) costs, a more expensive cpu that is easier to program will be vastly cheaper in the long run. There are always going to be tradeoffs. Allowing the largest data type accessed by the machine to be accessed on any boundary with no performance penalty requires that the data path from memory be twice as wide as the largest operand fetch that will be done from it, and the presence of a data aligner which is fast (probably combinational logic) and also that wide. On the VAX, for example, where the longest data type is 8 bytes (ignoring H-floating for the moment), you'd need a 128-bit wide data path from memory. That's four times its current width; frankly, I'd rather see the extra money spent on features that would speed the machine up all the time, not just when doing unaligned data references.
wm@tekchips.UUCP (Wm Leler) (04/26/84)
1- I have heard Dr. Fred Brooks state that it was a mistake to allow non-aligned word accesses in the IBM/370. 2- I don't understand why you must pay a 25% space penalty to use the Pyramid machine. Maybe rewrite your accessesing functions, but change your data structures? In the worst case you could grab everything a byte at a time and assemble. Not that you should do this, I was just trying to show that a machine that requires word alignment can do any data structure. 3- <enter sarcasm mode> Well, if you are complaining about machines that require word alignment, how about all those machines out there that require *byte* alignment! I want to be able to store my double precision floating point numbers starting with any bit in memory I desire! What about the waste when C programmers use ints (32 bits long!) for boolean flags? Or all those structures that contain padding? Wouldn't this solve the problem of structure comparisons? And I know how many bits wide my integers should be. I should be able to have 19 bit integers, or 129 bit floats. Foo on alignment. I mean, you hardware guys are making my job as a software hacker much harder. Like someone said, making a machine cheaply at the expense of making software harder to write is a big lose. <exit sarcasm mode> :-O :-) ;-) Please don't send me mail about bit aligned machines. I already know about them. Wm Leler 503/627-5151 wm.Tektronix@csnet-relay {ucbvax|allegra|decvax|ihnp4}!tektronix!wm
henry@utzoo.UUCP (Henry Spencer) (04/27/84)
Kenneth Reek contends, in part: Architectures that require aligning multi-byte quantities on particular boundaries sound to me like they were designed by an engineer who was interested in simplifying his own task at the expense of the software designers who will use the machine. Simple hardware wins on more counts than just making life easier for lazy engineers. It is simpler and cheaper to build, simpler for the software to run (comparing an 11/44 to the nightmare complexity of the VAX i/o structure, for example, tells you why 4.2BSD is so bloated), more reliable, easier to fix when it breaks, etc etc. Don't forget that magic word "cheaper". It has become fashionable to say "software costs totally dominate hardware costs", but most people forget to add "...unless you can't afford the hardware in the first place". Hardware and software money don't usually come out of the same pot, and the people who make decisions about such things are not necessarily as enlightened as we are. And once again, don't forget the example of the VAX: sure, it looks like a nice machine, but it's grossly overpriced for its market now. This is despite massive use of [semi-]custom ICs on the more recent VAXen -- and you would not believe what a disaster that is for production and maintenance! (There is an awful lot to be said for using standard parts, which means restricting yourself to things that can be built economically with them.) I have heard, from reliable sources, that if/when the successor to the VAX emerges, the biggest difference will be that it will be much simpler. Allowing non-aligned data to be accessed with a performance penalty is only a little less short-sighted. When designing a computer, anything that makes the job of writing software easier will be justified in the marketplace, especially the oem marketplace. If you can show me a way to eliminate alignment constraints without a speed penalty, WITHOUT adding large amounts of hardware (which I could use better to make the aligned version faster), I'd love to hear about it. It's hard. Given the current comparison of hardware costs to software (i.e. people) costs, a more expensive cpu that is easier to program will be vastly cheaper in the long run. See my first comments for some reasons why the software will be easier if the hardware is simpler. But actually, most of this is beside the [original] point. We are not talking about some decision which makes life a lot harder for the poor software jockey. We are talking about a decision which requires more memory to get equivalent performance. There is a perfectly straight- forward hardware-vs-hardware tradeoff here: is it cheaper to build a machine that doesn't care about alignment, or to just stack more memory on a machine that does care? I would give long odds that the latter approach wins, even just on initial cost. When you think about things like reliability and maintenance, it wins big. I agree that this doesn't help the poor people who have made a big investment in data structures that assume no alignment constraints. These people have made a mistake, period: they have imbedded a major machine-dependent assumption in software that obviously should have been portable. The merits of the assumption are debatable; what is not debatable is that it shouldn't have been wired in so deeply! They have asked whether they should spend 10 man-months recoding to imbed the opposite assumption, i.e. alignment constraints. I think they should spend however much time it takes to eliminate such deeply-wired-in assumptions completely. Or they will surely be bitten by something like this again some day. (Case in point: does the code assume that a long int is 32 bits? Might be a mistake if they ever want to move it to an Amdahl Unix -- and the big Amdahls are supposed to be pretty good Unix machines if you need lots of crunch.) [I can just hear N netnews readers firing up their afterburners to accuse me of being a degenerate anti-software hardware hacker... I'm a software specialist, degree in Computer Science (specifically, compilers), experience mostly in software. But I know a good deal about how hardware works and about the practical aspects of building it, and have personal experience with some of the problems.] -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
thomas@utah-gr.UUCP (Spencer W. Thomas) (04/28/84)
> Ignore those machines and maybe they'll go away. Architectures that > require aligning multi-byte quantities on particular boundaries sound to me > like they were designed by an engineer who was interested in simplifying his > own task at the expense of the software designers who will use the machine. > The computer is designed once, but there are an infinite number of programs > that might be run on it. This short-sighted engineering was OK for the early > 360's, but is not appropriate for modern computers, and if we software types > tolerate it, it is likely to keep happening. A current trend in computer design is to assume that the user will only be writing in a high-level language, and that the compiler will do the hard work of generating machine code. This is the theory behind RISC machines, in particular. Making the hardware simpler makes it run faster. Once we start getting really convoluted machines (such as ELI, or some pipelined machines which execute several instructions after a conditional branch, before actually branching), all your clever hacks based on assumptions about the hardware will just go straight down the tubes. If the compiler were smart enough, it would say "Oh, he's trying to access a longword, but it's not on the right boundary", and generate a byte move instruction to align it before accessing. The basic problem is that generality is slower. For really blinding speed, you always have to give up something. With big memories, arbitrary alignment is not too hard to give up. (I bet that the original application never put longwords on odd byte boundaries, now did it?) =Spencer
eric@gang.UUCP (Eric Kiebler) (05/01/84)
[Nuke the smurfs] Since when does having nice hardware and well though-out software make any difference? People have been buying machines with horrible software and lousy architectures for years, and continue to do so. In fact, I heard a rumor that "they" are producing a particular horrible machine with a lousy architecture at the rate of 1 every 16 seconds. As long as lousy hardware and horrible software get the job done, however, people are happy. Inertia is a wonderful/bad thing. Until machines are sufficiently sophisticated that increasing performance is *not* an issue, people will tweak and prod and make trade-offs that are not in their best long-term interest. Meanwhile, we all push our hair out in little gray bunches... eric -- from the gang down at... 38.37.45 N 90.12.22 W ..!ihnp4!afinitc!{gang|wucs!gang}!eric Any sufficiently advanced technology is indistinguishable from a rigged demo. Copyright (C) 1984 All Rights Reserved.
bob@hhb.UUCP (05/02/84)
Its time for a response. When I posed my question regarding the problem we were having with the pyramid system, I really wasn't looking for technical solutions to the problem, and I wasn't looking for replies telling me that we had just plain written the code wrong (PERIOD). (Implied in that statement was that we must be bumbling fools who shouldn't be allowed behind the wheel of a C Compiler), all I was looking for was whether this type of architecture would be dominant in the future, so that if it was, we can schedule a fix for it in the future. Now for a credibility speech. Me and my company have been involved with UN*X for quite a while. We have done two native portations to a word addressed computer. One of them UN*X v7m, and the other being System III. We have also done 3 C compiler code generators, 2 for word addressed machines and the other for the hp 1000 series machines (2 registers, no byte addressing!). We also did the XENIX adaptation for the IBM Instruments CS9000 computer. I feel we are quite competent at what we do, and have an excellent knowledge of compilers, and machine architectures. Now for a plea. Please - No more solutions to our problem, like -- write subroutines to fetch the bytes individually, use macros, trap the fault in the kernel and fix it there........ and on and on and on... We've already thought of the different ways to fix it, and feel, as stated, that to do it CORRECTLY would take 10 man months. Now for some exhaust from my after-burners ----->>>>>>>>>>******** Now let me flame at the folkz who felt compelled to tell me that we had written the code completely wrong. These responses we just typical (and as I had expected) of UN*X snob types with little understanding of what it takes to develop major software systems. With attitudes like that we ought to just throw most of UN*X out the window. Do you have any idea how much effort we spent making the UN*X utilities work on a machine that did not have character pointers the same size as all other pointers ? (This was for the word addressed machine I had previously mentioned). It was months, and an extremely tedious job. So obviously they wrote UN*X wrong PERIOD. No, you're right, it was written wrong, from a purely technical, non-realistic viewpoint. Maybe those who write code perfectly and think of ALL considerations before they code, do not have deadlines, schedules, and real world problems to address. I'm also sure that they have plenty of time to conduct seminars for all the new people they hire, espousing to them the perfect way to write C code. But then again, maybe the most ambitious system they have developed is comparable to a `grep' program. Or else maybe they have large contracts from the government. (I worked for a defense contractor once....) Did we make assumptions about the architecture of the machine we would run on ? Yup, for instance, we'll never run on an 8086 microprocessor because we assume we have a linear address space greater than 64k. (Unless someone comes out with a true large model compiler for it). And won't lose much from that decision. It boils down to speed/space vs. generality and we chose a compromise position. It bit us with the RISC machines, but unless they are the wave of the future (which was the original question i was posing) we won't disrupt our development cycle for them. Anyways, what the flamers ignored was that we now run on 12 different systems, and will run on many more to come, without mods to our system. So I fail to see that we wrote it completely wrong. And I admit, it won't port to a 67 bit flammigabar machine, but it sure seems to be a useful product in the market it enjoys. So how about coming out of your ivory towers and just try to put things in perspective. ========================================================== Be Company: HHB-Softron UUCP address: {decvax,allegra}!philabs!hhb!bob
pete@lvbull.UUCP (pete) (06/21/84)
Here Here for UNIX hacks for those old word address machines! Would rather run a hacked UNIX than GCOS/mod400 for example. I also have been battleing with word address machines for a few years now and can appreciate the problems. Maybe we should exchange software that has been hacked for word addressing machines; Ex: my 4bsd networking on a V7 filesystem. Anyway it would be nice to see word addressing bugs removed from the UNIX sources; it get a little booring macking the same hack in the kernel in iomove(), uiomove(), again and again. Will Bell and Berkely accept word-addressing fixes as worth-while. I believe all the newworking hackes were in mbuf.h and sys_inode.c