gnu@hoptoad.UUCP (04/05/87)
I stand somewhat corrected. The AMD 29000 does have byte insert and extract operations. They work register-to-register, and take one cycle. They *do* depend on state hidden in the system's status register (the low 2 bits of the last address loaded) so an optimizer is not free to move the instruction past another load, but they're better than shifts and masks for sure. At the time of my posting, I didn't have the real 29000 manual; it has since arrived. (Thanx to Phil Ngai for posting how to order it.) I said: > In other words, the designers of the 29000 did not think at all > about typical Unix code... In article <15337@amdcad.UUCP>, tim@amdcad.UUCP (Tim Olson) writes: > Ok, we have taken deep breaths and counted to 10, so now we can respond > without flaming (that comment *is* a personal insult). Sorry for the 'personalness' of the insult; it was unintended. My complaint was with the architecture, not the designers. I said it badly, and I'm sorry. > One of the goals > of the Am29000 is flexibility for the user. The Am29000 allows you to > either use the byte manipulation instructions in conjunction with a word- > oriented memory system *or* implement a byte-oriented memory system and > use the option bits in the load/store control field to select the size of > a memory access. The flexibility is there... Does this mean that a compiler can't tell whether a load instruction will return the entire word, or just a byte? It would seem that you'd need two different compilers, libraries, etc for a machine that really implemented a byte-oriented memory for the 29000. If the compiler generated a 'load' and then an 'extract byte', it would extract the wrong byte, since the 'load' would have already extracted the relevant byte into the low end of the destination register. As I recall, the 29000 also has a mode in which it will trap to the OS on any unaligned access to memory; this would seem to make both of the above methods fail. (Normally it ignores the low 2 bits of the address.) Wouldn't this require a third compiler? And how would you ever fetch unaligned bytes with this mode turned on? You could mask the address before using it, do the word load, then somehow move the 2 low bits into the system status register (may take a few instructions), then do the extract byte, but it all seems very slow and cumbersome as a way to get one character from RAM. > > First you see if the operands overlap, then...are they aligned, then... > > are they wider than a word, then...wider than two words?, then... Tim objected to this characterization of how to do strcpy on a word oriented machine. However, it still looks to me like a good description of the strcpy code they posted. I do like the funnel shifter. > To see the 8088 and the Am29000 compared, as if they were comparable, is > worse than our worst nightmares. AMD sells them both, does it not? -- Copyright 1987 John Gilmore; you can redistribute only if your recipients can. (This is an effort to bend Stargate to work with Usenet, not against it.) {sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu gnu@ingres.berkeley.edu
bcase@amdcad.UUCP (04/06/87)
In article <1960@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >I stand somewhat corrected. The AMD 29000 does have byte insert and >extract operations. They work register-to-register, and take one >cycle. They *do* depend on state hidden in the system's status >register (the low 2 bits of the last address loaded) so an optimizer is >not free to move the instruction past another load, but they're better >than shifts and masks for sure. Yeah, the residual control has its restrictions, but an optimizer *can* move instructions between the load and the byte-insert/extract. >> One of the goals >> of the Am29000 is flexibility for the user. The Am29000 allows you to >> either use the byte manipulation instructions in conjunction with a word- >> oriented memory system *or* implement a byte-oriented memory system and >> use the option bits in the load/store control field to select the size of >> a memory access. The flexibility is there... > >Does this mean that a compiler can't tell whether a load instruction >will return the entire word, or just a byte? It would seem that you'd >need two different compilers, libraries, etc for a machine that really >implemented a byte-oriented memory for the 29000. If the compiler >generated a 'load' and then an 'extract byte', it would extract the >wrong byte, since the 'load' would have already extracted the >relevant byte into the low end of the destination register. You are sorta right here, but rather than two compilers, one compiler with a switch should be sufficient. There are library implications as well. However, one system has one kind of memory, so a set of tools for that system will do the right things. The main reason we included the byte-oriented memory support is for the obscure controller application where the cost for byte-oriented memory is low (maybe this is when the total amount of memory is small?) but the cost for software byte-support is high. >As I recall, the 29000 also has a mode in which it will trap to the OS >on any unaligned access to memory; this would seem to make both of >the above methods fail. (Normally it ignores the low 2 bits of the >address.) Wouldn't this require a third compiler? And how would you >ever fetch unaligned bytes with this mode turned on? You could mask >the address before using it, do the word load, then somehow move the >2 low bits into the system status register (may take a few instructions), >then do the extract byte, but it all seems very slow and cumbersome >as a way to get one character from RAM. Well, the unaligned-access trap facility exists in order to provide some level of support for "old" databases. That is, lots of machines allow access to any size of data on any boundary (unaligned 32-bit words, unaligned 16-bit halfwords). There must (we guess) be many databases out there that were created under the assumption that such access will always be possible. A program accessing these data bases can, on the Am29000, be run with this trap enabled. Whenever the lower two address bits are not both zero (and, therefore, the access might be unaligned), a trap will be taken and the situation can be correctly dealt with. Yeah, its grungy, but it can work to provide compatibility and costs very little in the implementation. Note that the compiler must assume the second method of byte accesses (option bits are used to select the access size) making this trap a supplementary facility (not a third method). If the memory provides support for variable size accesses *and* unaligned variable size accesses, then the trap need not be turned on. >> > First you see if the operands overlap, then...are they aligned, then... >> > are they wider than a word, then...wider than two words?, then... > >Tim objected to this characterization of how to do strcpy on a word >oriented machine. However, it still looks to me like a good >description of the strcpy code they posted. Well, in the code we posted (and that I "wrote" with the help of my C compiler), all the alignment and overlap checking is done before the movment (or compare) method is chosen. Yes, there is some overhead, but the effect has been demonstrated to be positive (although for *very* short strings it might not be positive. Sigh, there are so many variables in computer architecure). >I do like the funnel shifter. >> To see the 8088 and the Am29000 compared, as if they were comparable, is >> worse than our worst nightmares. > >AMD sells them both, does it not? Sigh, you got me there (but *I* had nothing to do with the decision to carry the 8088 :-), but seeing the 8088 and the Am29000 compared is *still* worse that our worst nightmares. bcase