[comp.ai.neural-nets] More limited precision responses

tom@asi.com (Tom Baker) (03/16/91)

Recently Jacob Murre sent a copy of the comments that he received from
a request for references on limited precision implementations of
neural network algorithms.  Here are the results that I received from
an earlier request.  I have also collected a bibliography of papers on
the subject.  Because of the length of the bibliograpy, I will not
post it here.  I am sending a copy to all that responded to my initial
inquiry.  If you want a copy of the references send me a message, and
I will send it to you.  If you already sent a request for the
references and do not receive them in a few days, then please send me
another message.

Thanks for all of the references and messages.


Thomas Baker                    INTERNET: tom@asi.com  
Adaptive Solutions, Inc.	UUCP:  (uunet,ogicse)!adaptive!tom
1400 N.W. Compton Drive, Suite 340      
Beaverton, Oregon 97006

-----------------------------------------------------------------------

Hello Thomas.
             Networks with Binary Weights (i.e. only +1 or -1) can be
regarded as the extreme case of limited precision networks (they also
utilize binary threshold units). The interest in such networks is two fold :
theoretical and practical. Nevertheless, not many learning algorithms
exist for these networks.  Since this is one of my major interests, I
have a list of references for binary weights algorithms, which I eclose.

As far as I know, the only algorithm which train feed-forward networks
with binary weights are based on the CHIR algorithm (Grossman 1989,
Saad and Marom 1990, Nabutovskyet al 1990). The CHIR algorithm is an
alternative to BP that was developed by our group, and is capable of
training feed forward networks of binary (i.e hard threshold) units.
The rest of the papers in the enclosed list deal with algorithms for
the single binary weights percptron (a learning rule which can also
be used for fully connected networks of such units).

If you are interested, I can send you copies of our papers (Grossman,
Nabutovsky et al).  I would also be very intersted in what you find.
Best Regards
            Tal Grossman
            Electronics Dept.
            Weizmann Inst. of Science
            Rehovot 76100,  ISRAEL.

-----------------------------------------------------------------------


In article <894@adaptive.UUCP> you write:
>
>We use 16 bit weights, and 8 bit inputs and outputs.  We have found
>that this representation does as well as floating point for most of
>the data sets that we have tried.  I have also seen several other
>papers where 16 bit weights were used successfully.
>
>I am also trying to collect a bibliography on limited precision.  I
>would like to see the references that you get.  I do not have all of
>the references that I have in a form that can be sent out.  I will
>post them soon.  I would like to keep in touch with the people that
>are doing research in this area.
>
>Thomas Baker                    INTERNET: tom@asi.com  
>Adaptive Solutions, Inc.	UUCP:  (uunet,ogicse)!adaptive!tom
>1400 N.W. Compton Drive, Suite 340      
>Beaverton, Oregon 97006

My research topic is VLSI implementation of neural networks, and hence I have
done some study on precision requirement. My observations agree with that of
yours more or less (I have also read your paper titled "Characterization
of Neural Networks"). I found that the precision for weights is 11-12 bits for
the decimal place. For outputs, 4-5 bits were sufficient, but for backpropagated
error, the precision requirement was 7-8 bits (the values of these two
don't exceed 1).

As an aside, you say in your paper (cited above), that you can train the network
by accumulating weights as well as backpropagated error. While I was successful
with weights, I was never able to train the network by accumulating the error.
Could you give some more explanation on this matter? Accumulating error will
be very helpful from the point of view of VLSI implementation, since one need
not wait for the error to be backpropagated at every epoch, and hence the
throughput can be increased.

Thanks in advance,

Arun

(e-mail address: arun@vlsi.waterloo.edu)

[Ed. Oops! He's right, accumulating the error without propagating it 
doesn't work.  TB ]

-----------------------------------------------------------------------


I have done some precision effects simulations.  I was concerned with an
analog/digital hybrid architecture which drove me to examine three areas
of precision constraints: 1) calculation precision--the same as weight
storage precision and essentially the precision necessary in the backprop
calculations, 2) feedforward weight precision--the precision necessary
in calculating an activation level, 3) output precision--the precision
necessary in both the feedforward calculations and in calculating
weight changes.  My results were not much better than you mentioned--13
bits were required for weight storage/delta-w calculations.  I have a
feeling that you wanted to see something much more optimistic.  I should
say that I was examining problems more related to signal processing and
was very concerned with obtaining a low RMS error.  Another study which
looked more at classification problems and was concerned with correct
classification and not necessarily RMS error got more optimistic
results--I believe 8-10 bit calculations (but don't quote me).  Those
results originally appeared in a masters thesis by Sheldon Gilbert at
MIT.  His thesis I believe is available as a Lincoln Lab tech report
#810.  The date of that TR is 11/18/88.  The results of my study appears
in the fall '90 issue of "Neural Computation".  As I work on a chip
design I continually confront this effect and would be happy to discuss
it further if you care to.
Best Regards,
Paul Hollis
Dept. of ECE
North Carolina State Univ.
pwh@ecebowie.ncsu.edu
(919) 737-7452

-----------------------------------------------------------------------

Hi!
	Yun-shu Peter Chiou told me you have done something on finite word 
length BP. Right now, I'm also working on a subject related to that area
for my master's thesis. Could you give me a copy of your master's thesis?
Of course, I'm gonna pay for it. Your reply will be greately appreciated.

						Jennifer Chen-ping Feng

-----------------------------------------------------------------------

My group has interest in the quantisation effects from both theoretical and 
practical point of view. The theoretical aspects are very challenging.
I would appreciate if you are aware of papers in this areas. 

Regards,

Mrawn Jabri

-----------------------------------------------------------------------

Dear connectionist researchers,
 
We are in the process of designing a new neurocomputer. An important
design consideration is precision: Should we use 1-bit, 4-bit,
8-bit, etc. representations for weights, activations, and other
parameters? We are scaling-up our present neurocomputer, the BSP400
(Brain Style Processor with 400 processors), which uses 8-bit internal
representations for activations and weights, but activations are
exchanged as single bits (using partial time-coding induced by floating
thresholds). This scheme does not scale well.
 
Though we have tracked down scattered remarks in the literature on
precision, we have not been able to find many systematic studies on this
subject. Does anyone know of systematic simulations or analytical
results of the effect of implementation precision on the performance of
a neural network? In particular we are interested in the question of how
(and to what extent) limited precision (i.e., 8-bits) implementations
deviate from, say, 8-byte, double precision implementations.
 
The only systematic studies we have been able to find so far deal with
fault tolerance, which is only of indirect relevance to our problem:
 
Brause, R. (1988). Pattern recognition and fault tolerance in non-linear
  neural networks. Biological Cybernetics, 58, 129-139.
 
Jou, J., & J.A. Abraham (1986). Fault-tolerant matrix arithmetic and
  signal processing on highly concurrent computing structures. Proceedings
  of the IEEE, 74, 732-741.
 
Moore, W.R. (1988). Conventional fault-tolerance and neural computers.
  In: R. Eckmiller, & C. Von der Malsburg (Eds.). Neural Computers. NATO
  ASI Series, F41, (Berling: Springer-Verlag), 29-37.
 
Nijhuis, J., & L. Spaanenburg (1989). Fault tolerance of neural
  associative memories. IEE Proceedings, 136, 389-394.
 
 
Thanks!

Jacob M.J. Murre
Unit of Experimental and Theoretical Psychology
Leiden University
P.O. Box 9555
2300 RB Leiden
The Netherlands
 
E-Mail: MURRE@HLERUL55.Bitnet

-----------------------------------------------------------------------

We are in the process of finishing up a paper which gives
a theoretical (systematic) derivation of the finite precision
neural network computation.  The idea is a nonlinear extension
of "general compound operators" widely used for error analysis
of linear computation.  We derive several mathematical formula
for both retrieving and learning of neural networks.  The
finite precision error in the retrieving phase can be written
as a function of several parameters, e.g., number of bits of
weights, number of bits for multiplication and accumlation,
size of nonlinear table-look-up, truncation/rounding or jamming
approaches, and etc.  Then we are able to extend this retrieving
phase error analysis to iterative learning to predict the necessary
number of bits.  This can be shown using a ratio between the
finite precision error and the (floating point) back-propagated
error.  Simulations have been conducted and matched the theoretical
prediction quite well.  Hopefully, we can have a final version of
this paper available to you soon.

Jordan L. Holt and Jenq-Neng Hwang ,"Finite Precision Error
Analysis of Neural Network Hardware Implementation,"
University of Washington, FT-10, Seattle, WA 98195

Best Regards,

Jenq-Neng

-----------------------------------------------------------------------


Dear Baker,

I saw the following article on the internet news: 
-----
Yun-Shu Peter Chiou (yunshu@eng.umd.edu) writes:
> Does anyone out there have any references or have done any works
> on the effects of finite word length arithmetic on Back-Propagation.

I have done a lot of work with BP using limited precision
calculations.  My masters thesis was on the subject, and last summer
Jordan Holt worked with us to run a lot of benchmark data on our
limited precision simulator.  We are submitting a paper on Jordan's
results to IJCNN '91 in Seattle.

We use 16 bit weights, and 8 bit inputs and outputs.  We have found
that this representation does as well as floating point for most of
the data sets that we have tried.  I have also seen several other
papers where 16 bit weights were used successfully.

I am also trying to collect a bibliography on limited precision.  I
would like to see the references that you get.  I do not have all of
the references that I have in a form that can be sent out.  I will
post them soon.  I would like to keep in touch with the people that
are doing research in this area.
-----

Could I have a copy of your article sent to me?

As for the moment I am writing a survey of the usage of SIMD computers for
the simulation of artificial neural networks. As many SIMD computers are
bitserial I think the precision is an important aspect of the algorithms.

I have found some articles that discusses low precision neural networks and
I included refereces to them last in my letter. If you have a compilation
of other references that you recoment could you please send the list to me?

advTHANKSance
Tomas Nordstrom

---
Tomas Nordstrom  	       	       	  Tel:    +46 920 91061
Dept. of Computer Engineering   	   Fax:   	+46 920 98894
Lulea university of Technology  	   Telex: 	80447 LUHS
S-95187 Lulea, SWEDEN   	       	   Email: 	tono@sm.luth.se   (internet)
---

-----------------------------------------------------------------------


I a recent post on the connectionists mailing list you were
quoted as follows...

" ... We have found that for backprop learning, between
twelve and sixteen bits are needed. ...One method
that optical and analog engineers use is to calcualte the error by
running feed forward calculations with limited precision, and
learnign weights with heigher precision..."

I am currently doing recesearch for optical implemetions for
associative memories. The method that I am reseraching iteratively
calculates an memory matrix that is fairly robust. However when
I quantize the process during learning, the entire method fails.
I was wondering if you knew of some one who has had similar problems
in quantization during training. 

Thank you,
Karen haines

-----------------------------------------------------------------------

Hi Mr Baker
I am working on a project wherein we are attempting
to study the implications of using limited precision
while implementing backpropagation.
I read a message from Jacob Murre that said that you were 
maitaining a distribution list of persons interested 
in this field.  Would you kindly add me to that 
list.  My email address is ljd@mrlsun.sarnoff.com

Thanks

Leslie Dias