tom@asi.com (Tom Baker) (03/16/91)
Recently Jacob Murre sent a copy of the comments that he received from a request for references on limited precision implementations of neural network algorithms. Here are the results that I received from an earlier request. I have also collected a bibliography of papers on the subject. Because of the length of the bibliograpy, I will not post it here. I am sending a copy to all that responded to my initial inquiry. If you want a copy of the references send me a message, and I will send it to you. If you already sent a request for the references and do not receive them in a few days, then please send me another message. Thanks for all of the references and messages. Thomas Baker INTERNET: tom@asi.com Adaptive Solutions, Inc. UUCP: (uunet,ogicse)!adaptive!tom 1400 N.W. Compton Drive, Suite 340 Beaverton, Oregon 97006 ----------------------------------------------------------------------- Hello Thomas. Networks with Binary Weights (i.e. only +1 or -1) can be regarded as the extreme case of limited precision networks (they also utilize binary threshold units). The interest in such networks is two fold : theoretical and practical. Nevertheless, not many learning algorithms exist for these networks. Since this is one of my major interests, I have a list of references for binary weights algorithms, which I eclose. As far as I know, the only algorithm which train feed-forward networks with binary weights are based on the CHIR algorithm (Grossman 1989, Saad and Marom 1990, Nabutovskyet al 1990). The CHIR algorithm is an alternative to BP that was developed by our group, and is capable of training feed forward networks of binary (i.e hard threshold) units. The rest of the papers in the enclosed list deal with algorithms for the single binary weights percptron (a learning rule which can also be used for fully connected networks of such units). If you are interested, I can send you copies of our papers (Grossman, Nabutovsky et al). I would also be very intersted in what you find. Best Regards Tal Grossman Electronics Dept. Weizmann Inst. of Science Rehovot 76100, ISRAEL. ----------------------------------------------------------------------- In article <894@adaptive.UUCP> you write: > >We use 16 bit weights, and 8 bit inputs and outputs. We have found >that this representation does as well as floating point for most of >the data sets that we have tried. I have also seen several other >papers where 16 bit weights were used successfully. > >I am also trying to collect a bibliography on limited precision. I >would like to see the references that you get. I do not have all of >the references that I have in a form that can be sent out. I will >post them soon. I would like to keep in touch with the people that >are doing research in this area. > >Thomas Baker INTERNET: tom@asi.com >Adaptive Solutions, Inc. UUCP: (uunet,ogicse)!adaptive!tom >1400 N.W. Compton Drive, Suite 340 >Beaverton, Oregon 97006 My research topic is VLSI implementation of neural networks, and hence I have done some study on precision requirement. My observations agree with that of yours more or less (I have also read your paper titled "Characterization of Neural Networks"). I found that the precision for weights is 11-12 bits for the decimal place. For outputs, 4-5 bits were sufficient, but for backpropagated error, the precision requirement was 7-8 bits (the values of these two don't exceed 1). As an aside, you say in your paper (cited above), that you can train the network by accumulating weights as well as backpropagated error. While I was successful with weights, I was never able to train the network by accumulating the error. Could you give some more explanation on this matter? Accumulating error will be very helpful from the point of view of VLSI implementation, since one need not wait for the error to be backpropagated at every epoch, and hence the throughput can be increased. Thanks in advance, Arun (e-mail address: arun@vlsi.waterloo.edu) [Ed. Oops! He's right, accumulating the error without propagating it doesn't work. TB ] ----------------------------------------------------------------------- I have done some precision effects simulations. I was concerned with an analog/digital hybrid architecture which drove me to examine three areas of precision constraints: 1) calculation precision--the same as weight storage precision and essentially the precision necessary in the backprop calculations, 2) feedforward weight precision--the precision necessary in calculating an activation level, 3) output precision--the precision necessary in both the feedforward calculations and in calculating weight changes. My results were not much better than you mentioned--13 bits were required for weight storage/delta-w calculations. I have a feeling that you wanted to see something much more optimistic. I should say that I was examining problems more related to signal processing and was very concerned with obtaining a low RMS error. Another study which looked more at classification problems and was concerned with correct classification and not necessarily RMS error got more optimistic results--I believe 8-10 bit calculations (but don't quote me). Those results originally appeared in a masters thesis by Sheldon Gilbert at MIT. His thesis I believe is available as a Lincoln Lab tech report #810. The date of that TR is 11/18/88. The results of my study appears in the fall '90 issue of "Neural Computation". As I work on a chip design I continually confront this effect and would be happy to discuss it further if you care to. Best Regards, Paul Hollis Dept. of ECE North Carolina State Univ. pwh@ecebowie.ncsu.edu (919) 737-7452 ----------------------------------------------------------------------- Hi! Yun-shu Peter Chiou told me you have done something on finite word length BP. Right now, I'm also working on a subject related to that area for my master's thesis. Could you give me a copy of your master's thesis? Of course, I'm gonna pay for it. Your reply will be greately appreciated. Jennifer Chen-ping Feng ----------------------------------------------------------------------- My group has interest in the quantisation effects from both theoretical and practical point of view. The theoretical aspects are very challenging. I would appreciate if you are aware of papers in this areas. Regards, Mrawn Jabri ----------------------------------------------------------------------- Dear connectionist researchers, We are in the process of designing a new neurocomputer. An important design consideration is precision: Should we use 1-bit, 4-bit, 8-bit, etc. representations for weights, activations, and other parameters? We are scaling-up our present neurocomputer, the BSP400 (Brain Style Processor with 400 processors), which uses 8-bit internal representations for activations and weights, but activations are exchanged as single bits (using partial time-coding induced by floating thresholds). This scheme does not scale well. Though we have tracked down scattered remarks in the literature on precision, we have not been able to find many systematic studies on this subject. Does anyone know of systematic simulations or analytical results of the effect of implementation precision on the performance of a neural network? In particular we are interested in the question of how (and to what extent) limited precision (i.e., 8-bits) implementations deviate from, say, 8-byte, double precision implementations. The only systematic studies we have been able to find so far deal with fault tolerance, which is only of indirect relevance to our problem: Brause, R. (1988). Pattern recognition and fault tolerance in non-linear neural networks. Biological Cybernetics, 58, 129-139. Jou, J., & J.A. Abraham (1986). Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures. Proceedings of the IEEE, 74, 732-741. Moore, W.R. (1988). Conventional fault-tolerance and neural computers. In: R. Eckmiller, & C. Von der Malsburg (Eds.). Neural Computers. NATO ASI Series, F41, (Berling: Springer-Verlag), 29-37. Nijhuis, J., & L. Spaanenburg (1989). Fault tolerance of neural associative memories. IEE Proceedings, 136, 389-394. Thanks! Jacob M.J. Murre Unit of Experimental and Theoretical Psychology Leiden University P.O. Box 9555 2300 RB Leiden The Netherlands E-Mail: MURRE@HLERUL55.Bitnet ----------------------------------------------------------------------- We are in the process of finishing up a paper which gives a theoretical (systematic) derivation of the finite precision neural network computation. The idea is a nonlinear extension of "general compound operators" widely used for error analysis of linear computation. We derive several mathematical formula for both retrieving and learning of neural networks. The finite precision error in the retrieving phase can be written as a function of several parameters, e.g., number of bits of weights, number of bits for multiplication and accumlation, size of nonlinear table-look-up, truncation/rounding or jamming approaches, and etc. Then we are able to extend this retrieving phase error analysis to iterative learning to predict the necessary number of bits. This can be shown using a ratio between the finite precision error and the (floating point) back-propagated error. Simulations have been conducted and matched the theoretical prediction quite well. Hopefully, we can have a final version of this paper available to you soon. Jordan L. Holt and Jenq-Neng Hwang ,"Finite Precision Error Analysis of Neural Network Hardware Implementation," University of Washington, FT-10, Seattle, WA 98195 Best Regards, Jenq-Neng ----------------------------------------------------------------------- Dear Baker, I saw the following article on the internet news: ----- Yun-Shu Peter Chiou (yunshu@eng.umd.edu) writes: > Does anyone out there have any references or have done any works > on the effects of finite word length arithmetic on Back-Propagation. I have done a lot of work with BP using limited precision calculations. My masters thesis was on the subject, and last summer Jordan Holt worked with us to run a lot of benchmark data on our limited precision simulator. We are submitting a paper on Jordan's results to IJCNN '91 in Seattle. We use 16 bit weights, and 8 bit inputs and outputs. We have found that this representation does as well as floating point for most of the data sets that we have tried. I have also seen several other papers where 16 bit weights were used successfully. I am also trying to collect a bibliography on limited precision. I would like to see the references that you get. I do not have all of the references that I have in a form that can be sent out. I will post them soon. I would like to keep in touch with the people that are doing research in this area. ----- Could I have a copy of your article sent to me? As for the moment I am writing a survey of the usage of SIMD computers for the simulation of artificial neural networks. As many SIMD computers are bitserial I think the precision is an important aspect of the algorithms. I have found some articles that discusses low precision neural networks and I included refereces to them last in my letter. If you have a compilation of other references that you recoment could you please send the list to me? advTHANKSance Tomas Nordstrom --- Tomas Nordstrom Tel: +46 920 91061 Dept. of Computer Engineering Fax: +46 920 98894 Lulea university of Technology Telex: 80447 LUHS S-95187 Lulea, SWEDEN Email: tono@sm.luth.se (internet) --- ----------------------------------------------------------------------- I a recent post on the connectionists mailing list you were quoted as follows... " ... We have found that for backprop learning, between twelve and sixteen bits are needed. ...One method that optical and analog engineers use is to calcualte the error by running feed forward calculations with limited precision, and learnign weights with heigher precision..." I am currently doing recesearch for optical implemetions for associative memories. The method that I am reseraching iteratively calculates an memory matrix that is fairly robust. However when I quantize the process during learning, the entire method fails. I was wondering if you knew of some one who has had similar problems in quantization during training. Thank you, Karen haines ----------------------------------------------------------------------- Hi Mr Baker I am working on a project wherein we are attempting to study the implications of using limited precision while implementing backpropagation. I read a message from Jacob Murre that said that you were maitaining a distribution list of persons interested in this field. Would you kindly add me to that list. My email address is ljd@mrlsun.sarnoff.com Thanks Leslie Dias