pmk@hall.cray.com (Peter Klausler) (11/03/88)
In article <998@l.cc.purdue.edu>, Herman Rubin proposes that HW be able to: > Find the distance to the next one in a bit stream FAST. It would > be good to have an exception handler if one is not found. I am considering > algorithms not worth implementing if the operation is slow. This is the "leading zero count" instruction common to Cray instruction sets. The CFT77 compiler provides an intrinsic LEADZ function to get at this instruction. The Cray-2 also has a vector version. The leading zero count and population count (add up the 1 bits) instructions are fertile ground for some nifty bit-twisting tricks. For example, the code sequence I generate for the C logical negation operator ("!") is: S1 ZS0 ; take leading zero count of argument S1 S1>6 ; shift right six bits This turns 0 into 1 and everything else into 0 on a 64-bit machine. Population count can be handy when applied to vector masks. After constructing an iota vector from a mask, the pop count gives you its length. I've never seen these useful operations in another architecture, though. Have you? I'd be interested in hearing about it. I'm ignorant of HW design, but they can't be too hard to implement. -Peter Klausler / Cray Research compiler development / Mendota Heights, MN
smryan@garth.UUCP (Steven Ryan) (11/05/88)
>I've never seen these useful operations in another architecture, though. >Have you? I'd be interested in hearing about it. I'm ignorant of HW design, >but they can't be too hard to implement. > >-Peter Klausler / Cray Research compiler development / Mendota Heights, MN Obviously collusion of CDC and Cray management. CDC Cyber 205 provided all these and more. The compiler group, however, never knew anything about what was in a Cray. -- -- s m ryan +-------------------------------+----------------------------------------------+ | Home of the brave and land of | Congress shall make no law respecting the | | [deleted by the authority of | establishment of religion,...; or abridging | | the Official Secrets Act]. | the freedom of speech, or of the press;... | +-------------------------------+----------------------------------------------+
aglew@urbsdc.Urbana.Gould.COM (11/06/88)
>This is the "leading zero count" instruction common to Cray instruction sets. >The CFT77 compiler provides an intrinsic LEADZ function to get at this >instruction. The Cray-2 also has a vector version. > >The leading zero count and population count (add up the 1 bits) instructions are >fertile ground for some nifty bit-twisting tricks. For example, the code >sequence I generate for the C logical negation operator ("!") is: > > S1 ZS0 ; take leading zero count of argument > S1 S1>6 ; shift right six bits > >This turns 0 into 1 and everything else into 0 on a 64-bit machine. > >Population count can be handy when applied to vector masks. >After constructing an iota vector from a mask, the pop count gives you >its length. > >I've never seen these useful operations in another architecture, though. >Have you? I'd be interested in hearing about it. I'm ignorant of HW design, >but they can't be too hard to implement. > >-Peter Klausler / Cray Research compiler development / Mendota Heights, MN Gould had the SACZ (shift and count zeroes) instruction, which actually shifted on early models, but decoded in one cycle on high end models. MC680x0s have BFF0 (bit find first 0) (I always get the mnemonic for this wrong). Many other machines have similar. POPulation count instructions are found in most machines that have had NSA interest.
newbery@rata.vuw.ac.nz (Michael Newbery) (11/08/88)
The Burrogues B6700 Leading One Test instruction had the mnemonic LOG2 since that is what the op approximates to. It also had a CBON (count binary ones) instruction. One thing that is difficult to do in software (counter examples are welcomed please!) and easy to do in hardware is to reverse a bit stream. Michael Newbery <NEWBERY@rs1.vuw.ac.nz>
thomas@uplog.se (Thomas Hameenaho) (11/08/88)
In article <28200233@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes: >MC680x0s have BFF0 (bit find first 0) (I always get the mnemonic >for this wrong). > Not only the mnemonics! Its BFFFO, Bit Field Find First One. ^^^ -- Real life: Thomas Hameenaho Email: thomas@uplog.{se,uucp} Snail mail: TeleLOGIC Uppsala AB Phone: +46 18 189406 Box 1218 Fax: +46 18 132039 S - 751 42 Uppsala, Sweden
loeliger@mozart.uucp (Jon Loeliger) (11/10/88)
In article <10882@hall.cray.com> pmk@hall.cray.com (Peter Klausler) writes: >Population count can be handy when applied to vector masks. >After constructing an iota vector from a mask, the pop count gives you >its length. > >I've never seen these useful operations in another architecture, though. >Have you? I'd be interested in hearing about it. I'm ignorant of HW design, >but they can't be too hard to implement. > >-Peter Klausler / Cray Research compiler development / Mendota Heights, MN Yep, they are used elsewhere. Convex C1 and C2 both have trailing-zero-count and population counts. There is actually an esoteric reason to prefer a leading zero count over a trailing count.... If I'm not mistaking, there is also now a leading-ones-place instruction microcoded into the C2 set. Disclaimer: I'm not a hardware jock, just a compiler weenie... jdl ------------------------------------------------------------------------------ Jon Loeliger loeliger@convex.com Convex Computer Corporation "You should never attack an anarchist. He 701 N Plano, Richardson, TX 75081 has nothing to lose." Joe Bob's friend ------------------------------------------------------------------------------ Jon Loeliger loeliger@convex.com Convex Computer Corporation "You should never attack an anarchist. He 701 N Plano, Richardson, TX 75081 has nothing to lose." Joe Bob's friend
dik@cwi.nl (Dik T. Winter) (11/11/88)
In article <705@convex.UUCP> loeliger@convex.com (Jon Loeliger) writes: > Yep, they are used elsewhere. Convex C1 and C2 both have trailing-zero-count > and population counts. There is actually an esoteric reason to prefer a > leading zero count over a trailing count.... The best reason is of course that, given a popcount instruction, getting a trailing zero count takes only four instructions(*). Getting a leading zero count takes a few more. (*) POPCOUNT(AND(NOT(X),X-1)) -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
bgb@ihlpg.ATT.COM (Beuning) (11/12/88)
If anyone remembers the chess 0.5 program published in BYTE magazine a long time ago, it had two functions that did these functions in software. They use the idea of a bit board. A bit board has 64 bits (one for each chess square) with a bit being 1 if something interesting was in that square. For example there is a bit board for all white knights. The pop count function (cntrs in the program) would count the number of white knights and the leading zero function (nxtrs in the program) would find the first square that held a white knight. The nxtrs function was the highest time using function so the authors provided an assembly language version that made use of the floating point normalize instruction.
wsmith@m.cs.uiuc.edu (11/20/88)
>One thing that is difficult to do in software (counter examples are welcomed >please!) and easy to do in hardware is to reverse a bit stream. What exactly to you mean by bit stream? If you mean a single word, it is very easy. If it is more than a word the same trick may be used on a larger scale: (This is actually Pete Madany's code on loan.) union WC { unsigned int w; unsigned char c[4]; } in, out; out.c[3] = table[in.c[0]]; out.c[2] = table[in.c[1]]; out.c[1] = table[in.c[2]]; out.c[0] = table[in.c[3]]; The 256 byte array, table, begins { 0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0 ... } Bill Smith wsmith@cs.uiuc.edu uiucdcs!wsmith
cik@l.cc.purdue.edu (Herman Rubin) (11/20/88)
In article <3300039@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes: > > >One thing that is difficult to do in software (counter examples are welcomed > >please!) and easy to do in hardware is to reverse a bit stream. > > What exactly to you mean by bit stream? [Rest of article deleted.] By _bit stream_ I mean a consecutive collection of bits. These bits can have an arbitrary starting and stopping point. As an example of the use of bit streams, there are methods for generating non-uniform random variables which make heavy use of the following bit stream operations. Find the distance to the next one in a bit stream. Branch on the next bit in the stream. In these situations, the bits read are removed from the stream (i.e., the pointer is advanced). Also, it is necessary to create an exception if the stream is exhausted. Since these are tools, and other algorithms are available which are computationally more complex but are better suited to computer architecture, slow methods of accomplishing the ends will not be used. (BTW, the different methods yield different results.) It should be evident that a procedure which only looks at a few bits is computationally simpler than the addition of two numbers. But if it takes 20 times as long, it will be only used if absolutely necessary. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
dtynan@sultra.UUCP (Der Tynan) (11/22/88)
In article <3300039@m.cs.uiuc.edu>, wsmith@m.cs.uiuc.edu writes: > > >One thing that is difficult to do in software (counter examples are welcomed > >please!) and easy to do in hardware is to reverse a bit stream. > > What exactly to you mean by bit stream? If you mean a single word, it is > very easy. If it is more than a word the same trick may be used on a larger > scale: (This is actually Pete Madany's code on loan.) > > The 256 byte array, table, begins { 0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0 ... } > > Bill Smith The original author was talking about bit-reversal within a word. The classic example of this, is the addressing problem in a Fast Fourier Transform. The address bits must be reversed, during the last iteration. The problem is not so much the reversal (as shown by your table), but the fact that it is dependant on the transform size. For example, a 64-pt FFT must be bit-reversed in six bits. Your table can be used for this, but the result must be shifted twice. In real architectures, the bit-reversal can be anywhere from six to twenty-one bits. Thus, the table approach won't work. TI has a barrel shifter which does this (to sixteen bits -- the partnumber escapes me) bit-reversal, which goes to prove the original point, that it can be done in hardware easier than software. - Der -- dtynan@zorba.Tynan.COM (Dermot Tynan @ Tynan Computers) {apple,mips,pyramid,uunet}!Tynan.COM!dtynan --- If the Law is for the People, then why do we need Lawyers? ---