frey@eecae.UUCP (Zachary Frey) (05/03/89)
In article <18589@gatech.edu> myke@gatech.UUCP (Myke Reynolds) writes: >[ART's] memory capacity is no less than that of a linear filter, >and its size is >not limited, unlike BP. Since size = memory capacity, its memory capacity >is limited only by your implementation of a linear equation solver. I am not familiar with ART, but I am familiar with back-propagation from the Rummelhart & McClelland PDP volumes, and I don't remember ever seeing anything about a size limit to networks implemented with back- propagation. Could you elaborate? I am currently working on implementing a simulation for feedforword networks using BP as a learning rule that should work for arbitrarily large networks (limited by computer resources, of course). Since the equations involved are recursively defined, I don't see why there should be a size limit on the net. Zach Frey -- * U.S.nail: Zachary Frey || e-mail: frey@frith.egr.msu.edu * * 326 Abbot Hall || frey@eecae.ee.msu.edu * * E. Lansing, MI 48825 || voice: (517)355-6421 * * DISCLAIMER: My opinions, my responsiblity. *
myke@gatech.edu (Mike Rynolds) (05/04/89)
In article <15313@eecae.UUCP> frey@eecae.UUCP (Zachary Frey) writes: >I am not familiar with ART, but I am familiar with back-propagation from >the Rummelhart & McClelland PDP volumes, and I don't remember ever >seeing anything about a size limit to networks implemented with back- >propagation. Could you elaborate? > Try increasing the number of internal nodes without changing the input/output you train it on. If you were to simulate more complex input/output, an increased number of internal nodes would be necessary to learn the greater complexity. But even without greater complexity you will notice a rapid decrease in learning rate as a function of the number of internal nodes, and at a certain point, it stops learning all together. -- Myke Reynolds School of Information & Computer Science, Georgia Tech, Atlanta GA 30332 uucp: ...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!myke Internet: myke@gatech.edu
mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) (05/04/89)
In article <18615@gatech.edu) myke@gatech.UUCP (Mike Rynolds) writes: )In article <15313@eecae.UUCP) frey@eecae.UUCP (Zachary Frey) writes: ))I am not familiar with ART, but I am familiar with back-propagation from ))the Rummelhart & McClelland PDP volumes, and I don't remember ever ))seeing anything about a size limit to networks implemented with back- ))propagation. Could you elaborate? )) )Try increasing the number of internal nodes without changing the input/output )you train it on. If you were to simulate more complex input/output, an increased )number of internal nodes would be necessary to learn the greater complexity. )But even without greater complexity you will notice a rapid decrease in )learning rate as a function of the number of internal nodes, and at a certain )point, it stops learning all together. Yes, it starts to memorize once you have significantly more free parameters than examples. I would think, though, that this is a fundamental limitation--- you can always fit a 5th degree polynomial through 5 points, for example. The same sort of thing should apply in networks...(plug for my adviser's research). If you need to use significantly more free weights than examples then you're wasting weights, or the functional representation is poor. If you don't have enough examples, you won't be able to learn a complicated function if the given examples don't map out the input space well enough. )-- )Myke Reynolds )School of Information & Computer Science, Georgia Tech, Atlanta GA 30332 )uucp: ...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!myke )Internet: myke@gatech.edu Something that I'd like to explore further is learning with a radial-basis function network, i.e. a one-hidden layer network where the input layer's input function is: n_j = sum_i (o_i - w_ij)^2; o_j = 1/(1+n_j^2) e.g. instead of the conventional n_j = sum_i o_i * w_ij. You can learn the output layer of this network using a guaranteed conventional algorithm (linear least-squares; singular-value-decomposition) once you've selected the centers, i.e. the first layer of weights, with k-means clustering for example. With one hidden layer, this network can perform complicated nonlinear transformations, unlike the simple perceptron. For predicting chaotic time series, where the inherent locality of the functional representation is an advantage, this method is more accurate and faster to converge, I've found. Matt Kennel mbkennel@phoenix.princeton.edu