kck@dana.UUCP (04/13/87)
> >>> >>>> ...the unaligned-access trap facility exists in order to provide some >>>> level of support for "old" databases. That is, lots of machines allow >>>> access to any size of data on any boundary (unaligned 32-bit words, >>>> unaligned 16-bit halfwords). There must (we guess) be many databases >>>> out there that were created under the assumption that such access will >>>> always be possible... >>> >>> 1) Do such databases *really* exist? > > Er, well, I have no proof, but gosh, we just took it on faith.... We should > probably punish ourselves. > Such databases do exist. Specifically, HHB Softron has/had a CAE package called CADAT which required 32 bit loads/stores from arbitrary 16 bit alignment. >>> 2) Are they difficult to convert? > > This is a good question. With proper documentation, the conversion job > might be easy and quite perferrable to "support" of the old format. > When Ridge Computers (my previous employer) tried to port CADAT to the Ridge architecture, which requires 16 bit alignment for 16 bit loads/stores, 32 bit alignment for 32 bit loads/stores, and 64 bit alignment for 64 bit load/stores, they failed due to these alignment problems. When Pyramid Technology tried to port CADAT to their architecture they also failed due to alignment problems. Pyramid actually provided HHB Softron with a special version of their microcode which allowed non-aligned accesses. Last I heard HHB was still running with this "non-standard" Pyramid microcode. >>> 3) Will they perform acceptably with all that trap overhead? >>> Depends on your definition of acceptable. In the case of CADAT on a Ridge we decided the performance degradation wasn't worth the effort to provide for non-aligned accesses. >>> 4) Would it be easier to get the compiler to generate >>> whatever sequence of instructions would be required to >>> access unaligned operands without a trap? > > The compiler (often/always) can't know the alignment of operands. Thus, > it would have to generate slow code for all accesses. The trap mechanism > lets at least 32-bit-aligned-32-bit accesses run at full speed. > Correct. >>> 5) Did the idiots who designed such non-portable databases >>> make other mistakes that will kill you anyway? (Byte order, >>> anyone?) > The people at HHB weren't idiots. They were programmers constrained by small address space machines who did what they could to provide some useful functionality. Would you rather they have postponed development of their product until they had an architecture that supported a decent sized address space? Or would you suggest they had spent more money for a machine with a larger address space? Sometimes the real world throws constraints at you that you can't ignore. Ken Klingman Dana Computer hplabs!dana!kck
baum@apple.UUCP (04/13/87)
-------- [] In reference to making complicated things go as fast as simple things: it is rumored that IBM deliberately put wait loops into the 370/145 microcode, so that when the virtual memory was added to make them 370/148s, they wouldn't run any slower! -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
franka@mmintl.UUCP (04/15/87)
In article <16038@amdcad.AMD.COM> bcase@amdcad.UUCP writes: >Ok, I think maybe it *is* time to start the discussion of word-addressed >vs. byte-addressed memory systems. John Mashey, care to start us off? While we're at it, lets talk about bit-addressed systems, too. Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
mlord@bnr-public.uucp (Mark Lord) (03/23/89)
w-colinp@microsoft.uucp (Colin Plumb) writes: >In any structure, if you rearrange the components, you can lose at most >n-1 bytes to padding, where n is the strictest alignment restriction. For >most processors, the worst case is a double and a char, 7 bytes out of 16 >wasted. But if this is a major concern, rewrite the code to use two parallel >arrays. You'll waste at most 7 bytes total (in your 100Meg). > The "break into multiple aligned arrays" approach can work fine as long as one is dealing with arrays (of structures) to begin with. Unfortunately, I suspect that this is not of much use with dynamically allocated linked lists of structures, which is very likely when we are talking about multiple megabytes of data. Granted, by careful (human) reordering of the fields within a complex structure, one can almost always reduce wasted space to less than one machine word/line per allocated item. This is a good thing that we all do when possible on new code, but it is much more difficult (and risky) to go back and try to do this with code in a very large existing base of (otherwise) good, working software. The debate for devoting more silicon space to support efficient handling of misaligned accesses seems to hinge around supporting the huge amounts of perfectly good software that already exist, written before programmers in general became keenly aware of alignment/efficiency tradeoffs. I feel that this must be addressed by chip designers NOW. The MIPS R2000 R3000 family have included instructions for this purpose, LWL/LWR, SWL/SWR, but these must usually be used in pairs to achieve the desired effect. A big problem with such is atomicity of load/store operations, especially nasty when the software in question is also being used on one or more CISC machines which never exhibit this problem. Still, MIPS has at least provided a starting point for misaligned accesses, while other chip makers have yet to address the problem at all (ie. MC88100). -Mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * The Company doesn't even know I'm reading this stuff, let alone * * writing it (maybe THAT's why I'm not working on it anymore!). * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
henry@utzoo.uucp (Henry Spencer) (03/25/89)
In article <362@bnr-fos.UUCP> mlord@bnr-public.UUCP (Mark Lord) writes: >The debate for devoting more silicon space to support efficient handling >of misaligned accesses seems to hinge around supporting the huge amounts >of perfectly good software that already exist, written before programmers >in general became keenly aware of alignment/efficiency tradeoffs. How quickly we forget. The lack of attention to alignment/efficiency tradeoffs was a temporary aberration in the late 70s and early 80s. I assure you that those of us who wrote code for such unimportant, commercially-insignificant machines as the 360, the PDP11, and the 68000/010 were keenly aware of such tradeoffs from the beginning, because the hardware gave us no choice. I continue to think that the quantity of good software which disregards alignment is grossly overestimated. (It is not zero, and attention to it is necessary, but the idea that a random software package has a 90% chance of suffering from such carelessness is unjustified.) -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
w-colinp@microsoft.UUCP (Colin Plumb) (03/25/89)
mlord@bnr-public.UUCP (Mark Lord) wrote: > The "break into multiple aligned arrays" approach can work fine as long > as one is dealing with arrays (of structures) to begin with. Unfortunately, > I suspect that this is not of much use with dynamically allocated linked > lists of structures, which is very likely when we are talking about multiple > megabytes of data. Yes, but if you have at least one 32-bit pointer per code item, odds are very good your other data is much more than 32 bits long. So again, the waste (limited to 3 bytes, usually) is trivial. > Granted, by careful (human) reordering of the fields within a complex > structure, one can almost always reduce wasted space to less than one > machine word/line per allocated item. This is a good thing that we all do > when possible on new code, but it is much more difficult (and risky) > to go back and try to do this with code in a very large existing base > of (otherwise) good, working software. Actually, it can be done by machine. A simple working algorithm is to sort the elements by size, starting with the largest. And, as I said, the wasted space has an upper bound of (greatest alignment restriction)-(size of smallest data item), usually 4 and 1 bytes, respectively. Sometimes 8 and 1. If the structures aren't dumped, raw, to a file, I would have no hesitation reordering structures in my code. > The debate for devoting more silicon space to support efficient handling > of misaligned accesses seems to hinge around supporting the huge amounts > of perfectly good software that already exist, written before programmers > in general became keenly aware of alignment/efficiency tradeoffs. Given that that software needs to be recompiled anyway, and is resistant to big/little endian changes (i.e. doesn't use disgusting tricks), I don't see how it can have much dependency on struct packing. > I feel that this must be addressed by chip designers NOW. The MIPS R2000 > R3000 family have included instructions for this purpose, LWL/LWR, SWL/SWR, > but these must usually be used in pairs to achieve the desired effect. > A big problem with such is atomicity of load/store operations, especially > nasty when the software in question is also being used on one or more CISC > machines which never exhibit this problem. Still, MIPS has at least > provided a starting point for misaligned accesses, while other chip makers > have yet to address the problem at all (ie. MC88100). CISC processors have exactly the same problems. They still have to break it up into two accesses. The only problem is that you don't have the flexibility. If the second one fails, the VAX will restart both accesses, while a 68020 will only reexecute the second. In both cases, an arbitrarily long time can pass between the first (the first first, in the case of the VAX) access and the second. How is this different from what a RISC chip provides? I don't quite understand the problem. Alignment restrictions are almost zero hassle in most code, are enforced by compilers even on processors which don't beed them for efficiency reasons (the Microsoft C compiler inserts pad bytes in the instruction stream to get the target of a branch instruction word-aligned), and simplify the hardware tremendously. It seems like win-win. I never really used what I'm giving up, anyway. -- -Colin (uunet!microsoft!w-colinp)
cik@l.cc.purdue.edu (Herman Rubin) (03/25/89)
In article <59@microsoft.UUCP>, w-colinp@microsoft.UUCP (Colin Plumb) writes: > mlord@bnr-public.UUCP (Mark Lord) wrote: > > The "break into multiple aligned arrays" approach can work fine as long > > as one is dealing with arrays (of structures) to begin with. Unfortunately, > > I suspect that this is not of much use with dynamically allocated linked > > lists of structures, which is very likely when we are talking about multiple > > megabytes of data. .......................... Here is an actual example where unaligned is at least highly desirable. One has an array of bits; we may assume byte alignment is free. It is necessary to take a 24-bit block at a time to the right 24 bits of a 32-bit word. These 24 bits can be anywhere in the block, but this has to be done repeatedly, and can be vectorized if desired. Bits used may not be reused. It is also necessary to keep track of pointers, as the input array may need to be refilled, almost without warning and possibly at an unpredictable time. The refill procedure is somewhat costly. I believe that the penalty for unalignment will be cheaper than anything else if the necessary hardware is there. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
greg@sce.carleton.ca (Greg Franks) (03/26/89)
People debating the merits of unaligned accesses with various people at BNR saying they are useful... Having worked in the telephone industry (though not at BNR), I can relate to their desire to have the processor handle unaligned accesses. Why? Simple - the applications that they are dealing with are pieces of switching equipment that have hundreds of processors of various shapes and sizes. They all like to pass binary data to and from each other. Needless to say, the alignment requirements on a 6800 are rather different from those on a 88000. Furthemore, parts of the system are upgraded at different times - why toss out $1,000,000 of smart line cards when you are are updating your $10,000 processor. Just to make things difficult, communication channels may complicate life. On the switch that I worked on, we could fire 32 bytes of data through the message switch to any other processor on the system. Aligning the data may cause the message to grow above 32 bytes thus requiring another message. Our message system did not guarantee delivery, so just imagine the extra cost of tacking a protocol onto the system to ensure both messages arrived at their destination. The loss of one message is no big deal, the loss of half a message is. To continue, lots and lots of code on the switch was written in assembler language. If the world were perfect, one could change offsets by simply tweaking .h files. But we all know that the world is not perfect. :-(. Finally, telephone switches that I am familiar with do not have virtual memory. It costs money and it complicates real-time response. Oops - we just paged out dial tone - we'll have to wait a couple of seconds... :-) (not a real example...). Adding memory is expensive and a pain. We spent a good half year PACKING data (by shuffing fields around, changing the compiler etc) just so we would not have to jam in more memory cards. Memory isn't cheap, and bus slots may not be available. (All cards are designed in-house, and fit on a special bus. Economies of scale aren't there like they are in klones-R-us). Sorry that I've bored you to death. -- Greg Franks Systems and Computer Engineering, utzoo!dciem!nrcaer!sce!greg Carleton University, Ottawa, Ontario, Canada. uunet!mitel!sce!greg greg@sce.carleton.ca from Ottawa Ont, where life in the fast lane is skating two inners!
mlord@bnr-rsc.UUCP (Mark Lord) (03/27/89)
In article <1989Mar25.013342.22214@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <362@bnr-fos.UUCP> mlord@bnr-public.UUCP (Mark Lord) writes: >>The debate for devoting more silicon space to support efficient handling >>of misaligned accesses seems to hinge around supporting the huge amounts >>of perfectly good software that already exist, written before programmers >>in general became keenly aware of alignment/efficiency tradeoffs. > [stuff I agree with deleted] >... I continue to think that the quantity >of good software which disregards alignment is grossly overestimated. >(It is not zero, and attention to it is necessary, but the idea that >a random software package has a 90% chance of suffering from such >carelessness is unjustified.) I partially agree with Henry here, but would like to point out that over the past six years or so, a lot of systems, including a very large (many millions of lines) system with which I am intimately familiar, have taken the migration path from 16-bit to true 32-bit processors. Such software tends to be quite well aligned on 16-bit boundaries, but not always well aligned on 32-bit boundaries. Again, this thread is dealing with very large (10's of megabytes) of allocated data, tightly packed into structures of related fields. Yes, such software could be revisited and aligned properly for 32-bit or 64-bit (or whatever-bit) processors, but only at great expense in both manpower and risk to a perfectly good working package. From my experience, I do agree that alignment on 16-bit boundaries is probably not at issue here. But a lot of software is still being moved into the 32-bit age from 16-bit systems. This will probably not continue beyond another ten years or so, but it would be nice to have the hardware support for it while it is still important. Cheers -Mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If I were writing this for the Company, I'd demand a lot more money for it. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlord@bnr-rsc.UUCP (Mark Lord) (03/27/89)
In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: [lines from my article deleted] > >Yes, but if you have at least one 32-bit pointer per code item, odds are >very good your other data is much more than 32 bits long. So again, the >waste (limited to 3 bytes, usually) is trivial. Very trivial indeed for a single data item. Now consider a large number of such data items, dynamically allocated and maintained in data structures based on linked lists. Say.. ten million of them. This could cost between 10 and 30 megabytes of store, simply because everything was 32-bit aligned. [lines from my article deleted] > >Actually, it can be done by machine. A simple working algorithm is to sort >the elements by size, starting with the largest. And, as I said, the wasted >space has an upper bound of (greatest alignment restriction)-(size of smallest >data item), usually 4 and 1 bytes, respectively. Sometimes 8 and 1. > Okay, no problem. At least part of the process can be automated. Be careful, though, quite often structures are used as sub-fields of other structures, or are declared in minimal (public) form in interface modules, and then in extended (private) detail in the actual implementation modules (yes, there are very high level languages which support this and more..). Also, there have been clever programmers who did tricky and nasty things, such as over- laying one structure with another to achieve certain effects, such as outputing a stream of bytes to a block device very fast. Such things can be done in more portable ways at little speed expense, but remember, this is existing working code we are talking about, and the tricky programmers may have moved on to other areas/companies. Note the case of structures being used as sub-fields of other structures has potential to increase the alignment overhead (or else add considerable compiler overhead/headaches) in some cases, as one would probably want to ensure correct alignment of both structures. >If the structures aren't dumped, raw, to a file, I would have no hesitation >reordering structures in my code. > Or used as part of messages which are passed on to other devices/systems.. or overlaid with other structures by nasty code (see above). Such code is not always easy to find in a forest of 14 million lines. Especially when *none* of it is *my* code. [lines from my article deleted] > >Given that that software needs to be recompiled anyway, and is resistant >to big/little endian changes (i.e. doesn't use disgusting tricks), I >don't see how it can have much dependency on struct packing. > That's right. Just recompile it and it works. I like it! How come real life is seldom this simple? [lines from my article deleted] > >CISC processors have exactly the same problems. They still have to break >it up into two accesses. The only problem is that you don't have the >flexibility. If the second one fails, the VAX will restart both accesses, >while a 68020 will only reexecute the second. In both cases, an arbitrarily >long time can pass between the first (the first first, in the case of the >VAX) access and the second. How is this different from what a RISC chip >provides? I didn't actually have data faults in mind here. Of more concern were interrupts occuring between the multiple accesses, with the interrupt code using or modifying the data in question. Sloppy programming, by modern standards, perhaps, but we are talking about not-so-modern code here. > >I don't quite understand the problem. Alignment restrictions are almost >zero hassle in most code, are enforced by compilers even on processors >which don't beed them for efficiency reasons (the Microsoft C compiler >inserts pad bytes in the instruction stream to get the target of a branch >instruction word-aligned), and simplify the hardware tremendously. >It seems like win-win. I never really used what I'm giving up, anyway. Perhaps I've just had to deal with such problems a little too often over the past five years. Thanks for your viewpoint. Perhaps the world IS mostly sane. >-- > -Colin (uunet!microsoft!w-colinp) -Mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I made all of this up for my own benefit, not the Company's. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
les@unicads.UUCP (Les Milash) (03/28/89)
here's a question about "the large body of existing C code"... how many programmers assume that the address of a struct is the same as the address of its first member? or that the members are arranged in address space in the same order they appear in the struct definition in the source code? i wrote a program once that did that first evil, but now i realize that K&R never claims it'll work, and also there are legal ways around it that take no time or space. it seems like as long as programmers don't do that kind of junk, then (having a compiler) sort the members of a struct by size is painless and free. i personally have always been willing to do bizarre things in the name of performance (like keep 2 copies of registers--now That's unconventional, but if it helps, great.) advSORRYance if i screw this all up; i've barely ever posted before. let me say however, what a pleasure it's been to read your group for the past half-year; it's so refreshing to hear people who know how it works talk. most programmers (even highly-paid ones) wouldn't know which end of the soldering iron to hold (at least for about 60 seconds); most folks around here, if they bought more dynamic memory chips for their mac would ask why you have to malloc() it and can't use it for stack. thanks again for illuminating talk and the HOT hardware. those 98X, those R2000, those 88K, 29K, those Sun4, that VLIW, killer stuff. i can't believe how far we/YOU've come in the last 5 years. Running apps on a Sun4 with an Iris4D as the XServer hauls butt. Way faster than the Cyber I used to keypunch for, seems like. T--T Les Milash (w)303-443-6961 graphix s/w slave, Unicad Inc. /| /| (h)303-444-8552348 Arapahoe #17 Boulder CO USA 80302 T-+T | my opinions are so different from those of my employer :-( | T+-T that i often look for a different employer :-) |/ |/ and that's what these are. T--T
tim@crackle.amd.com (Tim Olson) (03/28/89)
In article <343@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: | here's a question about "the large body of existing C code"... | how many programmers assume that the address of a struct is the same | as the address of its first member? or that the members are arranged | in address space in the same order they appear in the struct definition | in the source code? i wrote a program once that did that first evil, | but now i realize that K&R never claims it'll work, and also there | are legal ways around it that take no time or space. it seems like | as long as programmers don't do that kind of junk, then (having a | compiler) sort the members of a struct by size is painless and free. Unfortunately (?) it is not allowed. Both K&R (section 8.5) and the pANS (section 3.5.2.1) say that within a structure, the objects declared have addresses that increase in the order in which they are declared. -- Tim Olson Advanced Micro Devices (tim@amd.com)
bruce@tigger.wv.tek.com (Bruce Robertson;685-2912;;;tigger) (03/29/89)
Historically, reordering of structure members was disallowed because C is so often used to talk to hardware. However, nothing says you can't write a utility to reorder your structure members at the C source level, optimizing for a particular architecture. Obviously, you would have to avoid using this utility if the order of structure elements matters (you wouldn't want to do it to the Unix kernel!) -- Bruce Robertson Domain: bruce@tigger.wv.tek.com UUCP: tektronix!tekecs!tigger!bruce -- Bruce Robertson Domain: bruce@tigger.wv.tek.com UUCP: tektronix!tekecs!tigger!bruce
slackey@bbn.com (Stan Lackey) (03/29/89)
In article <844@bnr-rsc.UUCP> mlord@bnr-rsc.UUCP (Mark Lord) writes: >In article <59@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: >>CISC processors have exactly the same problems. They still have to break >>[unaligned accesses] up into two accesses. >>flexibility. If the second one fails, the VAX will restart both accesses, >>while a 68020 will only reexecute the second. In both cases, an arbitrarily >>long time can pass between the first (the first first, in the case of the >>VAX) access and the second. How is this different from what a RISC chip >>provides? >I didn't actually have data faults in mind here. Of more concern were >interrupts occuring between the multiple accesses, with the interrupt code Nope! Processors typically test for interrupts between instructions, not during. Unless it is one of the VAX continuable instructions, in which case it can take one but only after all specifiers are evaluated (and fetched if appropriate). This is architecturally defined. (Disclaimer: Other architectures that support continuable instructions may do other things.) Page faults are the only case (that I can think of right now anyway) that can cause this class of situation. An interrupt could be taken (after the fault frame is pushed), but by now that's beside to point. I can't imagine any processor that would go to the trouble of taking an interrupt at any arbitrary time. It would save a few cycles to recognize the interrupt earlier, at the cost of many many cycles to save all internal state. Much added complexity for only a slight reduction (not a typo) in performance! >>I don't quite understand the problem. Alignment restrictions are almost >>zero hassle in most code, are enforced by compilers even on processors Then, why were the suppliers (IBM, DEC, Motorola, etc) pressured so much by programmers to add this feature in the first place? What has changed? >>which don't beed them for efficiency reasons (the Microsoft C compiler >>inserts pad bytes in the instruction stream to get the target of a branch >>instruction word-aligned), Instruction alignment is very far from the current issue. Some processors (in fact, ONLY those with variable-size instructions, typically non-RISCS's) can be benefited by this, in that the first fetch after a branch will pick up more of a variable-size instruction. >>and simplify the hardware tremendously. This is a major part of the point! If it were a tremendous savings, it would never have been done in the first place. Trust me, if you had a gate-level schematic of a CPU in front of you, you could look at it for weeks and never find the unaligned logic. So many pieces of it you need anyway: rotators to fetch bytes/words, stalls to deal with non-zero wait state memory systems, traps to handle page faults. It's just another small cost-small benefit item. Like a lot of the other stuff that's creeping back into RISC's these days. :-) Stan
huitema@mirsa.inria.fr (Christian Huitema) (03/29/89)
From experience, I can say that not having to bother with alignment can make networking code much faster. Messages exchanged over nets include all sort of variable size components, and padding is both frowned upon and somewhat ineffective -- often, one cannot predict where a structure will start. Thus, being able to simply define: #define integer_value(mess,x) (*(int *)(mess + x)) rather than: #define integer_value(mess,x) \ (((((mess[x]<<8)|mess[x+1])<<8)|mess[x+2]<<8)|mess[x+3]) can speed up a lot the decoding of these messages. Indeed, one can object the relative frequency of network operations vs computing, but the same holds for compression programs, disk accesses, etc... Christian Huitema
henry@utzoo.uucp (Henry Spencer) (03/30/89)
In article <BRUCE.89Mar28134751@tigger.wv.tek.com> bruce@tigger.wv.tek.com (Bruce Robertson;685-2912;;;tigger) writes: >... nothing says you can't write a utility to reorder your >structure members at the C source level, optimizing for a particular >architecture. Obviously, you would have to avoid using this utility >if the order of structure elements matters (you wouldn't want to do it >to the Unix kernel!) If you look at the kernel sources, especially the older parts, you'll find that there is a remarkable tendency for (e.g.) "char" members of structs to be grouped together so they will pack densely. -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
njs@scifi.UUCP (Nicholas J. Simicich) (03/30/89)
Another fairly common convention which limits the utility of reordering structures is that of using a character vector as the last element of the structure to simulate variable length records. One might have a character vector of MAXPATHLEN bytes, but only read or write the number of bytes actually contained in the filename. -- Nick Simicich --- uunet!bywater!scifi!njs --- njs@ibm.com (Internet)
bcase@cup.portal.com (Brian bcase Case) (03/31/89)
>I can't imagine any processor that would go to the trouble of taking >an interrupt at any arbitrary time. The ability to take an interrupt any time falls out of a simple design. RISCs as a class are an example. >>>I don't quite understand the problem. Alignment restrictions are almost >>>zero hassle in most code, are enforced by compilers even on processors >Then, why were the suppliers (IBM, DEC, Motorola, etc) pressured so much >by programmers to add this feature in the first place? What has changed? what has changed: 1) the level of expected performance (mostly through competition), 2) the size of main memory, and 3) the quality of both hardware and software engineers (the processor designers now know what the cost of arbitrary alignment really is, so do software designers). >>>and simplify the hardware tremendously. >This is a major part of the point! If it were a tremendous savings, >it would never have been done in the first place. Trust me, if you >had a gate-level schematic of a CPU in front of you, you could look at >it for weeks and never find the unaligned logic. So many pieces of it >you need anyway: rotators to fetch bytes/words, stalls to deal with >non-zero wait state memory systems, traps to handle page faults. You missed the point. The point is that a fetch of an item that spans a natural boundary requires *two* fetches. This is slower and requires annoying special-case consideration in the design. It complicates things, paving the way for design errors, and lengthens some critical paths. Knowing that two fetches, i.e., extra time, is required for unaligned accesses, software engineers that are concerned with the performance of their code won't do it anyway, even if the hardware supports it! Those that aren't concerned with the performance of their code are welcome to solve the problem anyway they like: use a processor that supports unalignment, emulate it in software, ....
guy@auspex.UUCP (Guy Harris) (03/31/89)
>From experience, I can say that not having to bother with alignment can >make networking code much faster. Messages exchanged over nets include all >sort of variable size components, and padding is both frowned upon and >somewhat ineffective -- often, one cannot predict where a structure will >start. Well, for TCP, UDP, and IP headers, at least, items are aligned on their "natural" boundaries. Not *all* protocols start stuff on random byte boundaries.... >Thus, being able to simply define: > >#define integer_value(mess,x) (*(int *)(mess + x)) > >rather than: > >#define integer_value(mess,x) \ > (((((mess[x]<<8)|mess[x+1])<<8)|mess[x+2]<<8)|mess[x+3]) > >can speed up a lot the decoding of these messages. Umm, the latter appears to have the advantage that it works regardless of the byte order of your processor; messages exchanged over nets are often exchanged between big-endian and little-endian processors, so you can't necessarily just point at some arbitrary location in your message and suck up an "int", even if your processor *does* support unaligned references. >Indeed, one can object the relative frequency of network operations >vs computing, but the same holds for compression programs, Well, "compress" (a common Lempel-Ziv compression program) works on bytes, or on bit strings; the former don't have alignment problems, and the latter have alignment problems that the sort of unaligned references support that is being talked about here won't fix.... (The original "compress" had VAX bit-field instructions jammed into it; it now has that controlled by "#ifdef vax" - has anybody bothered using other processor's bit-field instructions, and has it actually made any difference?) >disk accesses, Huh? "Disk accesses", as I think of them, are usually to sectors, typically of 512 bytes. As for data *on* disk, well: 1) if you write out structures in the native format, they will presumably be aligned properly; 2) if you write them out in the native format for one machine, and read them in on a different machine, you have problems other than alignment, such as big-endian vs. little-endian format - see the comments above on the networking case; 3) if you write them out in some "common" format (e.g., XDR, or ASN.1), you're already doing processing work, so you don't just point some structure pointer at the data and go.
bruce@tigger.wv.tek.com (Bruce Robertson;685-2912;;;tigger) (04/01/89)
In article <BRUCE.89Mar28134751@tigger.wv.tek.com> bruce@tigger.wv.tek.com (Bruce Robertson;685-2912;;;tigger) writes: >... nothing says you can't write a utility to reorder your >structure members at the C source level, optimizing for a particular >architecture. Obviously, you would have to avoid using this utility >if the order of structure elements matters (you wouldn't want to do it >to the Unix kernel!) In article <1989Mar29.232103.6835@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: > If you look at the kernel sources, especially the older parts, you'll > find that there is a remarkable tendency for (e.g.) "char" members of > structs to be grouped together so they will pack densely. What I was thinking of is things like structure definitions of stack frames, device registers, etc. -- Bruce Robertson Domain: bruce@tigger.wv.tek.com UUCP: tektronix!tekecs!tigger!bruce
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (04/01/89)
So far, people have brought up two hardware solutions: - 1. The CPU can just do it. - 2. The CPU can complain (interrupt) when someone tries to do it. No one has mentioned the third alternative. Let me begin by saying that I don't know if IBM PC/RTs still use the third method. When I used the RT (before it was announced) this method was, ahem, a design feature, obeying, ahem, RISC principles. The third method was quit simple: screw up. An unaligned fetch implies two memory fetches. The ROMP chip would do the low-order one, obtain the low-order bits, and put them in the low-order part of the target register. The high-order part of the register would be unchanged. Someone out there in Netland may be saying that it's OK: the compiler can just get everything right. Well, we were writing a compiler, and we didn't get it right, always, at first. It turns out that un-alignedness is a property that is preserved: add 4 to an unaligned pointer, and it still is. It also turned out that this could creep in through lots of little ratholes. Let me explain to you about the statistics of data. If only the low order part of a register is changed, there is a high likelihood that the high order part is still OK. This is because it was probably zero before, and it probably still should be zero. Alternatively, the register held a pointer, and its next value was a pointer to a nearby place. The upshot is that the compiler could generate bad code, which would run fine. Test programs would appear to be perfect - until you tried to have 256 inputs, or linked them with 64KB of library routines, or whatever else it took to cross that lurking threshold. Then the program would pick up bizarre behavior that was hard to backtrack. I don't know who invented this RISC feature, but I had strong feelings about him at the time. -- Don D.C.Lindsay Carnegie Mellon School of Computer Science --
len@synthesis.Synthesis.COM (Len Lattanzi) (04/01/89)
In article <4618@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
:
:So far, people have brought up two hardware solutions:
:
: - 1. The CPU can just do it.
: - 2. The CPU can complain (interrupt) when someone tries to do it.
:
:No one has mentioned the third alternative.
:
[ load low byte(s) into low part of register, leave top alone ]
The MIPS R2000 has this instruction and its cohort load high bytes into
high part of register leaving low alone. I'm not positive but I believe that
f77 will generate them for suitably unaligned common members.
Len Lattanzi (len@Synthesis.com) <{ames,pyramid,decwrl}!mips!synthesis!len>
Synthesis Software Solutions, Inc. 1 408 991 0367
292 Commercial Avenue, Sunnyvale, California 94086