apr@cbnewsl.ATT.COM (anthony.p.russo) (08/24/89)
In article <6980@sdcsvax.UCSD.Edu>, heirich@beowulf.ucsd.edu (Alan Heirich) writes: > [...] I have not seen any literature > proving bounds on learnability. This is obviously a very important area, > but if people are still quibbling about the prevelance of local minima in > error landscapes, I can't imagine that anyone has proven such a result about > which functions are learnable! > Some things have been proven not to be learnable, i.e. EX-OR. On the other hand, I'm sure no one has proven whether step-function scaling is learnable or not, but intuitively it seems that it is. Perhaps not by backprop ... There is, happily, a small body of research in this area, but not nearly enough. the best reference I can give is: L.G. Valiant, "A theory of the Learnable," Communications of the ACM, vol. 27, no.11, 1984. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ AT&T Bell Laboratories ~ ~ apr@cbnewsl.ATT.COM ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vaidyana@cb.ecn.purdue.edu (Ramaswamy Vaidyanathan) (08/24/89)
In a recent article, Heinrich writes: > This statement makes me uncomfortable. It has been proven (White) that a > two-layer feed forward network with a signmoid activation function (such as > you find in standard back propagation networks) can approximate any Borel > > Could you please post where this paper this paper by White was published. I have not seen this paper, it could be useful in my work. Thanks in advance. Vaidya. -------------------------
demers@beowulf.ucsd.edu (David E Demers) (08/25/89)
In article <1602@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >In article <6980@sdcsvax.UCSD.Edu>, heirich@beowulf.ucsd.edu (Alan Heirich) writes: >> [...] I have not seen any literature >> proving bounds on learnability. This is obviously a very important area, ^ I think Alan may mean "learnability of the step function." here. >> but if people are still quibbling about the prevelance of local minima in >> error landscapes, I can't imagine that anyone has proven such a result about >> which functions are learnable! >Some things have been proven not to be learnable, i.e. EX-OR. Surely you jest. XOR is not learnable by a neural net with no hidden layers, but is certainly a learnable function. >On the other hand, I'm sure no one has proven whether step-function >scaling is learnable or not, but intuitively it seems that it is. [...] the best reference I can give is: >L.G. Valiant, "A theory of the Learnable," Communications of the ACM, >vol. 27, no.11, 1984. Also interesting are: Ehrenfeucht, Haussler, Kearns & Valiant, "A General Lower Bound on the Number of Examples Needed for Learning" appearing in Information and Computation, 1988 [I just have a preprint...] and Haussler, "Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework", Artificial Intelligence 36 (1988). The proceedings of the 1988 workshop on Computational Learning Theory are available from Morgan Kaufman. One of my summer projects was to read several of the papers from the workshop... maybe at Christmas... But back to the original subject - I think Alan Heirich is right that the step function may be APPROXIMATED to arbitrary precision by a net with one hidden layer and sigmoid transfer functions. Backpropogation may not be an adequate learning method; however the cited work by White (appearing in the current issue of Neural Networks, I believe { vol 2 no. 3}) shows that the function IS learnable. This is stronger than showing that a net can approximate it. So, given the function and given a bound on the error, there is a net which can approximate the function to that bound, and an algorithm to learn the mapping. Our job is to discover them :-) Dave DeMers demers@cs.ucsd.edu Dept. of Computer Science & Engineering UCSD La Jolla, CA 92093
apr@cbnewsl.ATT.COM (anthony.p.russo) (08/29/89)
In article <6981@sdcsvax.UCSD.Edu>, demers@beowulf.ucsd.edu (David E Demers) writes: > > >Some things have been proven not to be learnable, i.e. EX-OR. > > Surely you jest. XOR is not learnable by a neural net with > no hidden layers, but is certainly a learnable function. > X-OR is not learnable. If you are given the first three entries in the truth table, you could not possibly generalize to the last entry with any confidence. That is, of the two possibilities for the last entry, neither is preferable. X-OR is *memorizable*, not learnable. In either case, learnability is not a requirement for neural net applicability. Nets memorize well. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ apr@cbnewsl.ATT.COM ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ackley@jeeves.bellcore.com (David H Ackley) (08/30/89)
In article<1667@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >X-OR is not learnable. If you are given the first three entries in the truth >table, you could not possibly generalize to the last entry with >any confidence. This renders the term "learnable" meaningless. Over the space of boolean functions, your claim is equally true of all of them. Without some kind of preference structure or bias over the space, both outputs are equally likely for the last row, regardless of the rest of the table. If you do allow such biases, I can pick one (say, a "parity bias") that makes XOR "learnable". Absolutely, generalization is a big important hard problem --- but it's different than learnability, and it's not why XOR is famous. | David Ackley Cognitive Science Research Group | |"To act or Bell Communications Research Inc.| | to react, ackley@flash.bellcore.com| | that is the question" ...!bellcore!ackley|
apr@cbnewsl.ATT.COM (anthony.p.russo) (08/30/89)
> In article<1667@cbnewsl.ATT.COM> I wrote: > >X-OR is not learnable. If you are given the first three entries in the truth > >table, you could not possibly generalize to the last entry with > >any confidence. > In article <17522@bellcore.bellcore.com>, ackley@jeeves.bellcore.com (David H Ackley) writes: > > This renders the term "learnable" meaningless. Over the space of > boolean functions, your claim is equally true of all of them. Without > some kind of preference structure or bias over the space, both outputs > are equally likely for the last row, regardless of the rest of the > table. If you do allow such biases, I can pick one (say, a "parity > bias") that makes XOR "learnable". > I agree that X-OR is learnable *with* bias. That however, is not the definition of "learnable" in the strictest sense. I won't attempt to prove or disprove your claim regarding all boolean functions, but why must boolean functions be learnable at all? When you saw X-OR for the first time, you most likely *memorized* the truth table. If not, then after *all four* entries, you determined the rule that if both inputs are equal, output "0"; else output "1". Could you have determined this rule with just three examples? No. How would you have chosen a bias? Perhaps our definitions of "learnable" are different. Mine is that, with a fraction of the possible samples, one can generalize to 100% accuracy. Otherwise, cfor example, if after 99 of 100 samples one cannot correctly predict the last output with 100% confidence, nothing has been learned at all about the function in question. If it does take 100% of the possibilities to learn something, that I claim it has not been learned, but rather *memorized*. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ apr@cbnewsl.ATT.COM put disclaimer here. ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mathew@jane.Jpl.Nasa.Gov (Mathew Yeates) (08/30/89)
>Perhaps our definitions of "learnable" are different. Mine is that, >with a fraction of the possible samples, one can generalize to 100% >accuracy. Otherwise, cfor example, if after 99 of 100 samples >one cannot correctly predict the last output >with 100% confidence, nothing has been learned >at all about the function in question. If it does take 100% of the >possibilities to learn something, that I claim it has not been learned, >but rather *memorized*. under this definition, no function is learnable. Considering a function as a set of ordered pairs, seeing some subset of the ordered pairs can never lead to a complete knowledge of the entire set. At least for a mere mortal. A rigorous definition of learnabilty from examples can be found in Valiants "A Theory of the Learnable".
dg1v+@andrew.cmu.edu (David Greene) (08/31/89)
> Excerpts from netnews.comp.ai.neural-nets: 30-Aug-89 Re: : Step Function > anthony.p.russo@cbnewsl. (1967) > Perhaps our definitions of "learnable" are different. Mine is that, > with a fraction of the possible samples, one can generalize to 100% > accuracy. Otherwise, cfor example, if after 99 of 100 samples > one cannot correctly predict the last output > with 100% confidence, nothing has been learned > at all about the function in question. If it does take 100% of the > possibilities to learn something, that I claim it has not been learned, > but rather *memorized*. I believe you are confusing 'sample' with 'census'. If the total population of examples is 100 (and assuming no noise) then perhaps your statement would be appropriate. However if the 100 represents a sample, then the issue of learnability after 99 examples from that sample is dependent on the size of the total population and the degree of confidence you require. (eg. Valient's PAC model of learnability -- Valient, L. G "A Theory of the Learnable" Communications of the ACM, 27 (11) :1134-1142, November 1984) -David -------------------------------------------------------------------- David Perry Greene || ARPA: dg1v@andrew.cmu.edu GSIA /Robotics || dpg@isl1.ri.cmu.edu Carnegie Mellon Univ. || BITNET: dg1v%andrew@vb.cc.cmu.edu Pittsburgh, PA 15213 || UUCP: !harvard!andrew.cmu.edu!dg1v -------------------------------------------------------------------- "You're welcome to use my opinions, just don't get them all wrinkled."
pluto@beowulf.ucsd.edu (Mark E. P. Plutowski) (08/31/89)
In article <1683@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >Perhaps our definitions of "learnable" are different. Mine is that, >with a fraction of the possible samples, one can generalize to 100% >accuracy. I would suggest you introduce one of two things into your definition of "learnability." Either 1) make it dependent on a particular architecture, or, 2) make it dependent upon a complexity measure, such as time needed to learn, number of units, or number of weights. It makes sense to discuss learnability when we talk about a set of concepts (e.g., functions) given a hypothesis space with which to learn the concepts (e.g., threshold-logic). Then, it is an issue whether the concept in question can be represented within the hypothesis space -- and then, we can discuss whether it can be learned given a feasible amount of resources.
apr@cbnewsl.ATT.COM (anthony.p.russo) (08/31/89)
> >Perhaps our definitions of "learnable" are different. Mine is that, > >with a fraction of the possible samples, one can generalize to 100% > >accuracy. Otherwise, cfor example, if after 99 of 100 samples [...] > In article <1989Aug30.162345.9569@elroy.jpl.nasa.gov>, mathew@jane.Jpl.Nasa.Gov (Mathew Yeates) writes: > under this definition, no function is learnable. Considering a function as > a set of ordered pairs, seeing some subset of the ordered pairs can > never lead to a complete knowledge of the entire set. At least for > a mere mortal. A rigorous definition of learnabilty from examples can > be found in Valiants "A Theory of the Learnable". Yes, I should stick to Valiant's paper. However, one thing: a known function is, in most cases, more than a set of ordered pairs. It is a set of ordered pairs related in some way. Discover the relation and no further examples are needed to have complete knowledge of the entire set. The question of "learnability" is whether the function, or rule, can possibly be extracted from a set of samples *with probability of error less than some value*. This is one mistake I made when I said "100% confidence"--that's not possible. However, getting back to X-OR, this probability of error = 0.5, clearly too high to consider it learnable. Aside: someone commented that learnability should be defined dependent on architecture. I don't see why. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ apr@cbnewsl.ATT.COM put disclaimer here ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pluto@beowulf.ucsd.edu (Mark E. P. Plutowski) (09/01/89)
In article <1697@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >... someone commented that learnability should be defined dependent >on architecture. I don't see why. I mentioned this for concreteness' sake. More generally, you want to know whether the set of hypotheses your learning protocol has at its disposal can represent the function you wish to learn to a satisfactory level of approximation - is the hypothesis space sufficient to characterize the set of concepts you wish to learn ? A particular architecture can represent a subset of some concept space - this is its hypothesis space - and it may be only capable of effectively learning some subset of its hypotheses space. In the case you mention, a single 2-input threshold gate cannot learn x-or because it cannot represent x-or. A 2-input threshold gate network with at least 1 hidden unit may not be able to effectively (feasibly, satisfactorily) learn x-or, perhaps due to the ineffectiveness of the learning law. With the definition of learning you initially proposed, we can say very little about learning - except that it will be impossible for all practical purposes.
ackley@wind.bellcore.com (David H Ackley) (09/01/89)
In article <1697@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >Yes, I should stick to Valiant's paper. However, one thing: a known function ^^^^^^^^^^^^^^ >is, in most cases, more than a set of ordered pairs. It is a set of ordered >pairs related in some way. By referring to "known functions" you are implicitly incorporating a bias in your notion of learnability --- i.e., you are preferring functions that have some (presumably simple) relation between input and output. I say "presumably simple" because I imagine you want to rule out properties like "appearing in the same truth table" as sufficient grounds for being "related in some way". Can you make your criteria for "known-ness" explicit? | David Ackley Cognitive Science Research Group | | "There are Bell Communications Research Inc.| | no facts, ackley@flash.bellcore.com| | only factors" ...!bellcore!ackley|
apr@cbnewsl.ATT.COM (anthony.p.russo) (09/01/89)
In article <17538@bellcore.bellcore.com>, ackley@wind.bellcore.com (David H Ackley) writes: > In article <1697@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: > >Yes, I should stick to Valiant's paper. However, one thing: a known function > ^^^^^^^^^^^^^^ > >is, in most cases, more than a set of ordered pairs. It is a set of ordered > >pairs related in some way. > > By referring to "known functions" you are implicitly incorporating a > bias in your notion of learnability --- i.e., you are preferring > functions that have some (presumably simple) relation between input > and output. I say "presumably simple" because I imagine you want to > rule out properties like "appearing in the same truth table" as > sufficient grounds for being "related in some way". > I could have omitted the word "known" and had a perfectly legitimate statement. That is, ANY deterministic FUNCTION is a set of related ordered pairs, whether known or not. > Can you make your criteria for "known-ness" explicit? No. You bring up a good point, because your argument is really, "What functions shall we consider in the hypothesis space?" I can't tell you; this appears to be getting very deep. On the surface, it seems that biases are !necessary! for learning anything at all. If so, then the biases are probably hard-wired and not learned, since they would have to be learned in terms of other biases, etc. Does this make sense to anyone else, or have I gone off the deep end? ~ tony ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ apr@cbnewsl.ATT.COM put disclaimer here ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pluto@beowulf.ucsd.edu (Mark E. P. Plutowski) (09/02/89)
In article <1727@cbnewsl.ATT.COM> apr@cbnewsl.ATT.COM (anthony.p.russo) writes: >I can't tell you; this appears to be getting very deep. On the surface, >it seems that biases are !necessary! for learning anything at all. >If so, then the biases are probably hard-wired and not learned, since >they would have to be learned in terms of other biases, etc. > ~ tony ~ Bias is indeed necessary to learn: if there is no bias, then there is no criterion for selecting one hypothesis over another, and the learning algorithm can do no better than random selection, on average.
bill@boulder.Colorado.EDU (09/04/89)
Is it necessary to have a bias in order to be able to learn? Not always. If the training set includes an example of every possible input, then the device only needs to be able to "memorize" the correct responses -- it doesn't need a preset bias. If, though, there are inputs not included in the training set, then the device can only produce responses by somehow "generalizing" from what it has seen, and the rules it uses for generalization constitute a bias. (I actually prefer the word "heuristic" to "bias". What I am saying here is that any device that learns from incomplete data must use some sort of heuristic.) For devices that learn solely from examples -- which includes all the neural networks I've seen -- that's all there is to it. It's not a deep question at all. It may be more interesting to wonder about more "intelligent" devices, which are capable of learning from commentary as well as examples. ("Commentary" is, e.g., your English teacher telling you that all proper nouns should be capitalized.) Even this, though, does not change the general principle. A comment may be thought of as a constraint upon the set of possible responses. If all of the examples, plus all of the constraints, do not force a unique response to every possible input, then the device, in order to pick a response, must use some sort of bias.
apr@cbnewsl.ATT.COM (anthony.p.russo) (09/05/89)
In article <11308@boulder.Colorado.EDU>, bill@boulder.Colorado.EDU writes: > > Is it necessary to have a bias in order to be able to learn? > > Not always. [...] > > If, though, there are inputs not included in the training set, > then the device can only produce responses by somehow "generalizing" > from what it has seen, and the rules it uses for generalization > constitute a bias. (I actually prefer the word "heuristic" to > "bias". What I am saying here is that any device that learns from > incomplete data must use some sort of heuristic.) > From what I have thought about and read here, it seems that learning actually constitutes learning *biases* and nothing else (unless it is memorization). *IF* this is true (and I'm not sure it is), then for a classifier, from an information-theoretic point of view, it then makes the most sense to give the network examples that are on the borderline of a rule or class. These "borderline" patterns should contain more information about biases than good examples of a class. In ways this makes sense -- it is akin to telling the network only where the boundaries between classes are. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tony Russo " Surrender to the void." ~ ~ apr@cbnewsl.ATT.COM put disclaimer here ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pluto@beowulf.ucsd.edu (Mark E. P. Plutowski) (09/06/89)
In article <11308@boulder.Colorado.EDU> bill@synapse.Colorado.EDU (Bill Skaggs) writes: > > Is it necessary to have a bias in order to be able to learn? > > Not always. If the training set includes an example of every >possible input, then the device only needs to be able to "memorize" >the correct responses -- it doesn't need a preset bias. > This is a strong form of a learning bias. The device assumes that everything it sees during training is relevant, and, sufficient. > It may be more interesting to wonder about more "intelligent" >devices, which are capable of learning from commentary as well as >examples. ("Commentary" is, e.g., your English teacher telling you >that all proper nouns should be capitalized. ... You might be interested in some approaches to learning theory in which the device has access to a teacher that can provide more than just a error-signal ... e.g., once a hypothesis is formed by the device, the teacher, rather than just saying whether or not the hypothesis is correct for given values of inputs, can provide counterexamples to the device. Of course, this requires a teacher with expert knowledge! And, the ability to ascertain what hypothesis the device currently entertains. Applying this idea to neural networks is difficult. The question is: how to apportion this type of training signal to the appropriate units in the network. And, once received by the pertinent units, what to make of it. Any ideas on how to backpropate such training information? "Commentary" feedback can apply to the entire hypothesis formed by the device, not just its performance on a particular input.
chrisley@kanga.uucp (Ron Chrisley UNTIL 10/3/88) (09/06/89)
Tony Russo writes:
"*IF* this is true (and I'm not sure it is), then for a classifier,
from an information-theoretic point of view, it then makes the most sense
to give the network examples that are on the borderline
of a rule or class. These "borderline" patterns should contain more
information about biases than good examples of a class.
In ways this makes sense -- it is akin to telling the network only where the
boundaries between classes are."
This is exactly the learning procedure behind Kohonen's LVQ2 (Learning Vector
Quantization) algorithm. See his paper in ICNN '88. He derives the optimality
of this method from Bayesian arguments. I do not see how the fact that
generalization = bias implies the optimality of learning the boundary
condsitions, and would be very interested in having you elaborate on why you
think it might.
Ron Chrisley
bill@boulder.Colorado.EDU (09/06/89)
> You might be interested in some approaches to learning theory > in which the device has access to a teacher that can provide more > than just a error-signal ... e.g., once a hypothesis is formed by > the device, the teacher, rather than just saying whether or not > the hypothesis is correct for given values of inputs, can provide > counterexamples to the device. Of course, this requires a teacher > with expert knowledge! And, the ability to ascertain what > hypothesis the device currently entertains. > > Applying this idea to neural networks is difficult. > The question is: how to apportion this type of training > signal to the appropriate units in the network. And, once > received by the pertinent units, what to make of it. > Any ideas on how to backpropate such training information? > "Commentary" feedback can apply to the entire hypothesis formed > by the device, not just its performance on a particular input. In principle, if you can come up with the training information, you can use back-prop (or something very much like it) to apply it. Back-prop is essentially gradient descent for an error function. Usually the error function is either the total squared error in the output layer for a randomly chosen input (this is "online" back- prop), or the average total squared error for a set of inputs (which is "batch-mode" back-prop). But as far as the mathematics is concerned, the error function can be anything you please. For a neural network, the "hypothesis" it forms is encoded by its weights, and any sort of "comment" you please can be jammed into the error function, if only you can formulate it as a numerical measure calculable from the weights. In fact, a few people have already been trying to do that sort of thing. As you probably know, a persistent problem with back-prop is that it tends to give networks that don't generalize very well to new inputs. One hope for getting better performance is to augment the error function with an extra term representing the "complexity" of the network, since it seems intuitively reasonable that simpler networks should generalize better. It isn't obvious how best to measure complexity: maybe the sum of the magnitudes of all the weights; or maybe the number of weights exceeding some fixed threshold; or maybe something else. This is work in progress, but there have been some promising beginnings. (Perhaps somebody directly involved can comment.)
chrisley@kanga.uucp (Ron Chrisley UNTIL 10/3/88) (09/06/89)
In article <11308@boulder.Colorado.EDU> bill@synapse.Colorado.EDU (Bill Skaggs) writes: > > Is it necessary to have a bias in order to be able to learn? > > Not always. If the training set includes an example of every >possible input, then the device only needs to be able to "memorize" >the correct responses -- it doesn't need a preset bias. > This assumes, as has much of the discussion on this topic, that one is dealing with detertministic classification problems, that is, one in which the proper response to an input is constant. But many problems, such as the classification of natural signals like speech, are statistical in that at different times, the same point will be mapped to different classes. Then you can at best model the discriminant function for each of the categories, and then, given a point, choose the class that is most probable. This requires more than one instance per sample in order to use the memorization technique. So either you abandon memorization (since there are usually an infinite # of inputs anyway, and you don't know in advance what an effective discretization of the input space would be) and therefore use a bias, or, in the cases where memorization is possible, you will have to look at more than one instance of each input (I'm wondering: Isn't this how KNN or k-means classifiers, or even codebooks, work?). But what will be the proper number of instances to look at? That cannot be determined in adavance, so that may be yet another way bias can creep in... Ron Chrisley
zmacv61@flamingo.doc.ic.ac.uk.doc.ic.ac.uk (L K C Leighton) (09/09/89)
I am new to neural networks, and to news. dubious as i was about sending a message over the network, i had to query this: >X-OR is *memorizable*, not learnable. >In either case, learnability is not a requirement for neural net applicability. >Nets memorize well. i thought the whole idea of neural nets was to provide learning abilites, not just to provide a memorising function. computers these days are fast enough as it is, and with good programming style, a good database can be provided which would far outstrip any neural network. perhaps someone could enlighten me on this, as programs i have been working on (limited to 1000 neurons approx) have centred around imitating the brain. i saw memory recall-type neural nets as limited, as they have no means to link one neural pattern to another (thought processing we do all the time). also - does anyone have any programs along these lines, or any others. as i am new to this subject, i have not yet tackled any books or articles on this subject, and a lot of the terminology is lost on me. programs i can understand :-) many thanks luke leighton. | Luke Leighton @ Imperial College | zmacv61@uk.ac.ic.doc | | they said it was impossible. i agreed. | tharg@uk.ac.ic.cc | | i said it was impossible. they disagreed. | zdac131@uk.ac.kcl.cc.elm | | Luke Leighton @ Imperial College | zmacv61@uk.ac.ic.doc | | they said it was impossible. i agreed. | tharg@uk.ac.ic.cc | | i said it was impossible. they disagreed. | (On Janet :-) |
jagota@sybil.cs.buffalo.edu (Arun Jagota) (09/12/89)
In article <11308@boulder.Colorado.EDU> bill@synapse.Colorado.EDU (Bill Skaggs) writes: > > Is it necessary to have a bias in order to be able to learn? > > Not always. If the training set includes an example of every >possible input, then the device only needs to be able to "memorize" >the correct responses -- it doesn't need a preset bias. > Even then, it might be *useful* to consider the complexity of the representation (the functional form) of the learned function that the device forms. A device will exhibit good functional form learning ability if it forms a *good* representation. Since what constitutes the *best* representation (min # of params, length of representation in bits, etc) is a subjective matter, this amounts to needing a bias. Moreover, an unbiased representation will be a table look-up which has no generalisation capability (whereas a *better* representation should have some generalisation capability). CSNET : jagota@cs.buffalo.edu BITNET : jagota@sunybcs