cleary@husc9.harvard.edu (Kenneth Cleary) (08/31/90)
While I find discussions of hardware optimizations achieved through improvement of the electronic circuitry (such as GaAs, and higher-density VLSI) and design improvements like pipelining or instruction caching, I would like to hear about examples of optimized algorithms implemented as digital circuits. It is not difficult to find a few books on optimized algorithms implemented as software, but I'm having difficulty finding out about hardware aspects. (I'm not specifically referring to digital circuit optimizations like gate- count reduction, though that also interests me.) Most hardware optimizat- ions I've seen assume a fixed algorithm which is to be implemented as efficiently as possible, rather than looking at desired input and desired output, and attempting to find the most efficient algorithm. SOFTWARE CASE IN POINT: Donald E. Knuth, The Art of Computer Programming, vol.2, Seminumerical Algorithms page 278, section 4.3.3. How Fast Can We Multiply? Knuth talks about getting faster multiplication than order-n**2 behavior. He describes some methods of getting behavior of n**(lg3) or n**1.585 but this only starts to provide a signifigant boost as the size of the numbers being multiplied becomes quite large. (i.e. it might not be worth it for 8-bit & 16-bit numbers, but may be competitive for 32-bit & 64-bit.) HEAVY PARAPHRASING OF KNUTH BEGINS... If we have two n-bit [let's say 32-bit] numbers {underscores for subscripts} u=(u_31,u_30,...,u_1,u_0) and v=(v_31,v_30,...,v_1,v_0) {individual bits} we can write u=(2**16)U_1 + U_0 v=(2**16)V_1 + V_0 basically breaking each number into its most-sig half & least-sig half, so uv= ((2**32)+(2**16))(U_1)(V_1) + (2**16)((U_1)-(U_0))((V_0)-(V_1))+((2**16)+1)(U_0)(V_0) "This formula reduces the problem of multiplying [32-bit] numbers to three multiplications of [16-bit] numbers... plus some simple shifting and adding operations." This breaking up into two parts is done recursively, breaking the number into smaller and smaller chunks. {reminds me of quicksort} PARAPHRASING ENDS. {My apologies for typos} So, while there may not have been demand for this sort of approach with smaller 8-bit chunks, it may provide so much of a boost at 64-bits, as to make it more even more worthwhile to build 64-bit machines for handling 64-bit numbers, instead of chopping them into two 32-bit chunks to be handled by a 32-bit processor. (wait a minute... "breaking into two chunks" sounds familiar...) Oh Well! It's late, and I think I should stop typing right n.............
Publius@dg.dg.com (Publius) (09/01/90)
In article <4047@husc6.harvard.edu> cleary@husc9.UUCP (Kenneth Cleary) writes: >While I find discussions of hardware optimizations achieved through >improvement of the electronic circuitry (such as GaAs, and higher-density >VLSI) and design improvements like pipelining or instruction caching, I >would like to hear about examples of optimized algorithms implemented as >digital circuits. I would second your call for the discussion in this area. > Donald E. Knuth, The Art of Computer Programming, > vol.2, Seminumerical Algorithms > page 278, section 4.3.3. How Fast Can We Multiply? >Knuth talks about getting faster multiplication than order-n**2 behavior. >He describes some methods of getting behavior of n**(lg3) or n**1.585 There is an algorithm that can do even faster than O(n**1.585). I don't have the reference in front of me now and I don't remember the exact time complexity. But it can be found in Aho, Ullman, and Hopcroft's book on algorithms. That algorithm is based on number-theoretic FFT. It takes advantage of the similarity between multiplication and convolution in signal processing and the fact that the classic Convolution Theorem can be extended to the abstract algebraic structures. -- Disclaimer: I speak (and write) only for myself, not my employer.
preston@titan.rice.edu (Preston Briggs) (09/04/90)
In article <889@dg.dg.com> publius@dg-gw.dg.com (Publius) writes: >In article <4047@husc6.harvard.edu> cleary@husc9.UUCP (Kenneth Cleary) writes: >> Donald E. Knuth, The Art of Computer Programming, >> vol.2, Seminumerical Algorithms >> page 278, section 4.3.3. How Fast Can We Multiply? >>Knuth talks about getting faster multiplication than order-n**2 behavior. >>He describes some methods of getting behavior of n**(lg3) or n**1.585 >There is an algorithm that can do even faster than O(n**1.585). >I don't have the reference in front of me now and I don't remember >the exact time complexity. But it can be found in Aho, Ullman, and Hopcroft's >book on algorithms. That algorithm is based on number-theoretic FFT. >It takes advantage of the similarity between multiplication and convolution in >signal processing and the fact that the classic Convolution Theorem can be >extended to the abstract algebraic structures. It's called Sch\"{o}nhage-Strassen. You can use it to multiply to $n$-bit numbers in $O_{B}(n \log n \log\log n)$ steps. Unfortunately, this is the asymptotic complexity (as $n$ becomes very large). The key point is made in the following sentence: "The value of $n\/$ for which many of the algorithms described here become practical is enormous." That is, they aren't going to help us with 64 bit multiplication. -- Preston Briggs looking for the great leap forward preston@titan.rice.edu
hays@isc.intel.com (Kirk Hays) (09/07/90)
In article <889@dg.dg.com> publius@dg-gw.dg.com (Publius) writes: >In article <4047@husc6.harvard.edu> cleary@husc9.UUCP (Kenneth Cleary) writes: >>Knuth talks about getting faster multiplication than order-n**2 behavior. >>He describes some methods of getting behavior of n**(lg3) or n**1.585 > > >There is an algorithm that can do even faster than O(n**1.585). >I don't have the reference in front of me now and I don't remember >the exact time complexity. But it can be found in Aho, Ullman, and Hopcroft's >book on algorithms. That algorithm is based on number-theoretic FFT. Looking in my copy of Aho, Ullman, and Hopcroft (ISBN 0-201-0023-7), they discuss multiplication on pages 307-310, and claim the presented algorithm is O(n**1.59), which is sufficiently close to O(n**1.585) to lead me to believe that they are discussing the same algorithm as Knuth. I don't have my Knuth handy to verify this. Have you a different reference for multiplication faster than O(n**1.59)? -- Kirk Hays - NRA Life. Dulce et Decorum est, pro patra mori. "The way of the samurai is found in death. It is this simple. If you can accept it then you will fight as though you are already dead." Tsunemoto Yamamoto, _Hagakure_
cik@l.cc.purdue.edu (Herman Rubin) (09/07/90)
In article <908@intelisc.isc.intel.com>, hays@isc.intel.com (Kirk Hays) writes: > In article <889@dg.dg.com> publius@dg-gw.dg.com (Publius) writes: > >In article <4047@husc6.harvard.edu> cleary@husc9.UUCP (Kenneth Cleary) writes: > >>Knuth talks about getting faster multiplication than order-n**2 behavior. > >>He describes some methods of getting behavior of n**(lg3) or n**1.585 ................... > Have you a different reference for multiplication faster than O(n**1.59)? Knuth describes faster ones. The fastest known algorithm, asymptotically, is the Sch"onhage-Strassen algorithm, which is O(n logn loglogn). It is known that this is within a few powers of loglogn of the best possible asymptotics (Cook and Anderaa). But this says nothing about what to do with a given precision and a given architecture. Even the apparently simple Karatsuba-Ofman method runs into questions of overflow and/or sign, which can made it slower than the usual one for a given precision. The n**1.585 algorithm is easily programmed, and I doubt that hardware could do much better than a microcoded program, except for adding one bit to numbers before multi- plication to take care of the above problem. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
mshute@cs.man.ac.uk (Malcolm Shute) (09/14/90)
Going back to the original question in this discussion (which seems since to have veered towards one specific multiplication example). In article <4047@husc6.harvard.edu> cleary@husc9.UUCP (Kenneth Cleary) writes: >It is not difficult to find a few books on optimized algorithms implemented >as software, but I'm having difficulty finding out about hardware aspects. Can I express some degree of surprise at this request... I thought there was already a massive cross-traffic of ideas from hardware to software and back again. Normally, if something is originally implemented in one (hardware of software) first, when it comes to re-implementing it in the other, the initial approach is to mimic the original implementation (complete with any leasons learned about algorithmic improvements in that first medium). For instance, the 'obvious' way to implement pseudo-random number generators in software is to use the '*2' (multiply-by-two) operation to set up shift registers in the program, just like the ring counters in the original hardware. Or, look at the way the original unix encryption program 'simulated' the wheels on the Enigma machine. In the other direction, look at how floating point hardware started by implementing what was originally performed in subroutines. If someone subsequently publishes a paper on how to improve/replace the algorithm in the original implemention... this will have direct relevence to the subsequent implementation. What I think you are more probably asking about, however, is whether there is a body of mathematics to manipulate hardware descriptions, just as computer science has already built up theorems to manipulate software descriptions. I am most familiar with the work of Dr. Sheeran (now at Glasgow University) for designing finite state machine hardware in general, and systolic array hardware in particular, using a language which is an extension of Backus' FFP: Sheeran, M. (1984). uFP, An Algebraic VLSI Design Language, Programming Research Group, Technical Monograph PRG-39, Oxford University Computing Laboratory, 150 pp. Sheeran, M. (1985). Designing regular array architectures using higher order functions, in J.P.Jouannaud, ed"., Functional Programming Languages and Computer Architecture, LNCS, Vol. 201, pp 220-237, Springer-Verlag, Berlin. I know that others are working on languages which can prove theorems about other types of hardware. Prof. Sutherland refers to such in his Turing Award lecture: Sutherland, I.E. (1989). Micropipelines, CACM, Vol. 32, No. 6, pp 720-738. -- Malcolm SHUTE. (The AM Mollusc: v_@_ ) Disclaimer: all