radford@calgary.UUCP (Radford Neal) (02/16/88)
In article <4611@ames.arpa>, lamaster@ames.arpa (Hugh LaMaster) writes: > Actually, it is extremely important in a networked world for word sizes > to be a power of two bits long. As someone who has spent a considerable > amount of time maintaining code to move data between machines of > various sizes (in my case, DEC and IBM 8 bit byte/32 bit word machines > and CDC 60 bit word machines), and helping users convert data written > on one such machine to another, etc. etc., I can say with complete > conviction: > > Word size (or the size of any addressable data) should ALWAYS be a > power of two. > > In addition to making life much easier for those who have to move > binary data between machines (and with NFS, there are a lot more > such people out there), it also makes it much easier to move CODE > between machines. Yes, I have converted bit-level code from IBM's > to Cyber 170's and back to 8 bit machines again. Please explain what problems you found in converting data from systems with wordsizes that aren't a power of two. Given the big-endian vs. little-endian incompatibility, I would think that a general data typing scheme would be needed in any case, so odd word sizes would be no additional hassle. As for bit-bashing code, a #define for wordsize would seem to solve the problem. Off hand, the only advantage to a power of two word size that I see is that you can do the calculations needed to simulate bit addressing using shifts and ands rather than quotient and remainder. Radford Neal
lamaster@ames.arpa (Hugh LaMaster) (02/19/88)
In article <1354@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: > >Please explain what problems you found in converting data from systems >with wordsizes that aren't a power of two. Given the big-endian vs. Example: Write code on CDC 7600 to create a graphics output file which, when moved to a PDP-11, is an integral number of 512 byte blocks. Simple. You just use 1024 60 bit words on the 7600. You get 1024x60 bits= 61440 bits. 61440/8=7680 bytes = 15 512 byte blocks. Now, write your code on the 7600 so that you pack the 16 bit PDP words into the 60 bit CDC words. It is trivial to do. Also, very time consuming, and you have to do it for each such application. Which means programming costs. Convert the code to a 64 bit machine and rewrite the bit crunching stuff again. Now, try to maintain separate versions of the code for each machine. Then ask yourself how much easier life would be if you didn't have wierd word sizes. >little-endian incompatibility, I would think that a general data typing Well, I am a confirmed "standard-endian" (there should be a standard mapping of words onto bytes in a specific order) myself, because, it does make things easier. But, having congruent word sizes is the first step. >scheme would be needed in any case, so odd word sizes would be no >additional hassle. As for bit-bashing code, a #define for wordsize would >seem to solve the problem. Actually, no. The code to handle arbitrary bit alignments is much more complex than if the wordsizes involved are multiples of a power of two. And much slower than code which knows what the wordsizes are, because you have to do arithmetic in addition to boolean operations. > >Off hand, the only advantage to a power of two word size that I see is >that you can do the calculations needed to simulate bit addressing >using shifts and ands rather than quotient and remainder. Well, actually, you are right that if all machine data types were a power of two factor of some base, that is good enough. The base could be 9 bit bytes. It just so happens that with so many 8 bit byte addressable machines out there, that is the defacto standard. The ability to do boolean operations only (no arithmetic) is almost always a big performance win. The original argument was that IF you need to move BINARY data between machines, THEN, all the machines involved should have wordsizes that are a power of two. Now, do you ever NEED to move BINARY data between machines? Well, Yes. Because, using character coded data for graphics and floating point applications is orders of magnitude slower. For that reason, I am looking forward to the day when all machines use IEEE 32 and 64 bit floating point formats (whether they implement full IEEE arithmetic or not) in a standard byte order (consistent Big-Endian, of course :-) ).
peter@athena.mit.edu (Peter J Desnoyers) (02/19/88)
In article <1354@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >In article <4611@ames.arpa>, lamaster@ames.arpa (Hugh LaMaster) writes: > >> Word size (or the size of any addressable data) should ALWAYS be a >> power of two. > >Please explain what problems you found in converting data from systems >with wordsizes that aren't a power of two. Given the big-endian vs. >little-endian incompatibility, I would think that a general data typing >scheme would be needed in any case, so odd word sizes would be no >additional hassle. As for bit-bashing code, a #define for wordsize would >seem to solve the problem. I spent some time at BBN writing code for for the C70, which has a 20 bit word made up of two 10-bit bytes. The major problem in porting C code to it was the 10-bit byte, rather than the 20-bit word. I think that any large C program must have 'for (i = 0; i < 8; i++) bash bit;' in it just to screw up those poor machines. Unfortunately, the name of the include file that defines constants for such sizes seems to be different on every system, thus leading to a different portability problem... If you restrict your word size to a multiple of 8 bits, such problems should go away. (Most 'portable' code seems to realize that word size can differ, as well as ended-ness, and handles this in a correct fashion.) Peter Desnoyers
radford@calgary.UUCP (Radford Neal) (02/21/88)
In article <4915@ames.arpa>, lamaster@ames.arpa (Hugh LaMaster) writes: > Example: Write code on CDC 7600 to create a graphics output file > which, when moved to a PDP-11, is an integral number of 512 byte blocks. > Simple. You just use 1024 60 bit words on the 7600. You get 1024x60 bits= > 61440 bits. 61440/8=7680 bytes = 15 512 byte blocks. Now, write your code > on the 7600 so that you pack the 16 bit PDP words into the 60 bit CDC > words. It is trivial to do. Also, very time consuming, and you have > to do it for each such application. Which means programming costs. > Convert the code to a 64 bit machine and rewrite the bit crunching > stuff again. Now, try to maintain separate versions of the code for > each machine. Then ask yourself how much easier life would be if you > didn't have wierd word sizes. Most of your problems seem to come from word sizes that are not a multiple of eight bits. Whether they are a power of two seems less of an issue. Any difference in word sizes at all will bring problems of incompatible versions, reporting errors when an integer is too big for the destination format, etc. > Well, actually, you are right that if all machine data types were > a power of two factor of some base, that is good enough. The base > could be 9 bit bytes. It just so happens that with so many 8 bit > byte addressable machines out there, that is the defacto standard. I agree. Despite the purported advantages of 9-bit bytes, having to cope with standard tape drives (to pick an example) seems like a lot of hassle. > Do you ever NEED to move BINARY data between > machines? Well, Yes. Because, using character coded data for graphics > and floating point applications is orders of magnitude slower. Certainly true. Various "remote procedure call" and other network protocols have a standard interchange format that (ought to) be much faster than ASCII while still keeping applications independent of data conversion considerations. Radford Neal
bcase@apple.UUCP (Brian Case) (02/22/88)
In article <3049@bloom-beacon.MIT.EDU> peter@athena.mit.edu (Peter J Desnoyers) writes: >In article <1354@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >>In article <4611@ames.arpa>, lamaster@ames.arpa (Hugh LaMaster) writes: >> >>> Word size (or the size of any addressable data) should ALWAYS be a >>> power of two. >> >>Please explain what problems you found in converting data from systems >>with wordsizes that aren't a power of two. Given the big-endian vs. >>little-endian incompatibility, I would think that a general data typing >>scheme would be needed in any case, so odd word sizes would be no >>additional hassle. As for bit-bashing code, a #define for wordsize would >>seem to solve the problem. > >I spent some time at BBN writing code for for the C70, which has a 20 >bit word made up of two 10-bit bytes. The major problem in porting C >code to it was the 10-bit byte, rather than the 20-bit word. > >If you restrict your word size to a multiple of 8 bits, such problems >should go away. (Most 'portable' code seems to realize that word size >can differ, as well as ended-ness, and handles this in a correct >fashion.) Er, well, 8-bit bytes are part of the solution. The next problem comes when you ask the question "What word is the Nth byte in, given that the array starts at this base address?" On a machine with 2^n 8-bit bytes per word, the answer is easy: just shift N down by (little) n and add the result to the base address. Now assume that you have 6 8-bit bytes per word. How do you compute the word offset? It requires a divide by three and a divide by two. Good luck (sure, Burroughs did it, but that doesn't make it right).