cik@l.cc.purdue.edu (Herman Rubin) (11/17/90)
In article <5961@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > From article <2732@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > [...] > > Well, here is an example which will definitely not work on all machines. > > It is desired to move 3 bytes into the three least significant bytes of > > a 4-byte word (other sizes are appropriate), and repeat this operation. > > This can be done efficiently by using pointers to words on a machine with > > unaligned reads. Even with overhead for unaligned reads, it is hard to > > see how to do this efficiently otherwise. > It's hard to see what this has to do with the array vs. pointer > issue which was being discussed. It looks to me like you've got > a word that's being 'mapped' as an array of bytes and your loop > is filling the bottom three bytes. Where are the pointers? I neglected to mention that one does not care what goes into the leading byte. The use of pointers here is NOT conforming to the usual standards. The code for this would be *y++ = *x++; decrease x, treated as a byte pointer, by 1; This uses one unaligned read and one aligned write, as compared to three reads and writes. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jlg@lanl.gov (Jim Giles) (11/18/90)
From article <2742@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > [...] > *y++ = *x++; > decrease x, treated as a byte pointer, by 1; Ok. I see now. I interpreted your initial request differently. The equivalent non-pointer version is (including the declarations you left out): bit.32 :: y(Number_of_elements) bit.24 :: x(Number_of_elements) ... do i=1,Number_of_elements y(i) = x(i) end do > [...] > This uses one unaligned read and one aligned write, as compared to > three reads and writes. So does my version. At least, assuming that 32-bit numbers are aligned. (Note: 'bit' data types are unsigned; for signed integers, the declaration is 'int'. Your use of pointers is thus seen as an example of needing to get around restrictions caused by inadequate control over data types.) J. Giles
hrubin@pop.stat.purdue.edu (Herman Rubin) (11/20/90)
In article <6291@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: > From article <2742@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): > > [...] > > *y++ = *x++; > > decrease x, treated as a byte pointer, by 1; > > Ok. I see now. I interpreted your initial request differently. > The equivalent non-pointer version is (including the declarations > you left out): > > bit.32 :: y(Number_of_elements) > bit.24 :: x(Number_of_elements) > ... > do i=1,Number_of_elements > y(i) = x(i) > end do > > > [...] > > This uses one unaligned read and one aligned write, as compared to > > three reads and writes. > > So does my version. At least, assuming that 32-bit numbers are > aligned. (Note: 'bit' data types are unsigned; for signed integers, > the declaration is 'int'. Your use of pointers is thus seen as > an example of needing to get around restrictions caused by inadequate > control over data types.) Your version may be far less efficient, depending on the hardware. At least on some hardware, an unaligned read is not that much slower than an aligned read. Also, the instruction sequences are different. Mine says to take 4 bytes at a particular location and move them to another, and then change the pointer for the source location. Yours says to form a 3 byte unit at a given location, and move them into the three bytes at the destination. As far as the result, there is no important difference. But as far as the implementation, there will be. Yours would be interpreted as taking those 3 bytes only, and moving them to the destination, with some convention about the 4-th byte at the destination. Mine will take 4 bytes, and just move them to the destination. It involves no processing, other than getting and putting the bytes. It will not work on all machines, but it will be much faster on the ones on which it works. Now conceivably something could be added to the language to allow the unaligned read/aligned write procedure if the hardware will permit it. But it still takes the programmer to let the compiler know this. There are other situations calling for unaligned reads/writes if they can be done. Pointers allows the user who understands the hardware to implement them. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)
jlg@lanl.gov (Jim Giles) (11/29/90)
From article <16900@mentor.cc.purdue.edu>, by hrubin@pop.stat.purdue.edu (Herman Rubin): > In article <6291@lanl.gov>, jlg@lanl.gov (Jim Giles) writes: >> From article <2742@l.cc.purdue.edu>, by cik@l.cc.purdue.edu (Herman Rubin): >> > [...] >> > *y++ = *x++; >> > decrease x, treated as a byte pointer, by 1; > [...] > Your version may be far less efficient, depending on the hardware. At > least on some hardware, an unaligned read is not that much slower than > an aligned read. Also, the instruction sequences are different. Mine > says to take 4 bytes at a particular location and move them to another, > and then change the pointer for the source location. Yours says to form > a 3 byte unit at a given location, and move them into the three bytes > at the destination. Oh. I _still_ misunderstood what you were after. well, how about this: Type int_overlay int.32 :: field End type int_overlay Int.32 :: y(0:Number_of_elements-1) int.8 :: x(0:3*Number_of_elements) map x as int_overlay do i = 0,Number_of_elements-1 y(i) = x(3*i).field end do Remember what I have said before that mapping is a _storage_ associated operation. This code picks up a four byte field every three bytes through the X array and stores these values in the Y array. Assuming that 32 bits is a word alignment, this does one unaligned load of a whole word and one aligned store of that word per trip through the loop. > [...] It will not work on all machines, but it will be > much faster on the ones on which it works. I prefer using portable features whenever possible. Which is why I support the addition of features like mappings to existing languages (or new ones for that matter). If mappings were available in your language, the above code would port everywhere. To be sure, the code might be very inefficient on hardware which _really_ penalizes unaligned memory traffic - but your pointer version would be too AND it wouldn't port everywhere. J. Giles