hallyb@globbo.enet.dec.com (John Hallyburton) (08/01/90)
There's a lot of brainpower in this newsgroup; please forgive me if this turns out to be a stupid question. I don't mean to lower the average level of discussion! I'd like to know if it's possible to use neural nets to solve problems that aren't fully deterministic, that is, similar inputs produce two or more different outputs in different training cases. One simple example is in the field of weather forecasting. Suppose we want to forecast next week's Duluth rainfall. We might input various training cases over the last 40 years, including such data as the local rainfall last week, two weeks ago, ..., last Winter's snowfall in the Rockies, the temperature of the Pacific Ocean at various stations, the phase of the moon, etc., the list goes on and on. When you boil it down to basics, so to speak, you will end up with training cases that have repeated inputs and different outputs. What could I expect a good neural net program to produce? Taking a simple example where A, B, and C are constants and there is only one output (rainfall total in inches), history might give us the following cases: A, B, C, 0 A, B, C, 5 A, B, C, 5 A, B, C, 6 Would most neural nets decide the correct output for (A, B, C) is the average, 4? Would there be any way to construct extra outputs that serve as sort of a confidence indicator, so that the reader could see not only a forecast "4 inches of rain expected next week", but also get a feeling for its accuracy, optimally yielding a forecast like "75% chance of rain with about 5 inches expected. Get that corn planted". If the number of inputs is small one could code up a solution using more traditional programming techniques. But if there are 100 inputs then it becomes impractical to try to look at every possible subset of the inputs to determine what was obviously 75% in the example above. Any thoughts appreciated. John (usual disclaimers)
spoffojj@hq.af.mil (Jason Spofford) (08/01/90)
I'll give you my two cents... If you plotted the monthly rain accumulation amounts over the last 40 years for Duluth, you would end up with a simple graph of rain fall amounts over time. This graph would have a line that went up and down depending on the seasons. What you are asking the NN to do is to project what is going to happen in the future, based on what happened in the past. I have not trained NN's (in the traditional approaches) so I am a little unclear on exactly how to train the NN to perform this function. I imagine the input to the NN would be TIME, like the number of months. You would train the NN on the last forty years by presenting each month, one at a time, and telling the NN what the outputs should be (the rainfall amounts). The NN will hopefully attempt to create a function that will tell you not only what really happened in the past, but what will happen in the future too, by presenting a future TIME value. There are so many variables in weather that the forecasting performance of a NN trained as stated above is likely to be poor. I hope this info is useful. -- ---------------------------------------------------------- ) Jason Spofford <((((((> spoffojj.hq.af.mil ( ) LAN Manager George Mason Univ. Grad. Stud. ( ----------------------------------------------------------
mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) (08/02/90)
In article <spoffojj.649522001@lgn> spoffojj@hq.af.mil (Jason Spofford) writes: >If you plotted the monthly rain accumulation amounts over the last 40 >years for Duluth, you would end up with a simple graph of rain fall >amounts over time. This graph would have a line that went up and down >depending on the seasons. What you are asking the NN to do is to >project what is going to happen in the future, based on what happened >in the past. > > I have not trained NN's (in the traditional approaches) so I >am a little unclear on exactly how to train the NN to perform this >function. David Rogers from RIACS in Mountain View, CA recently presented some results about characterizing rainfall in N. Australia using Kanerva's SDM. The problem he addressed looked more like a regression problem than a prediction problem. But he reported some good results on a very large problem. If I remember correctly, his approach resembles genetic search. The original query was: > Message-ID: <14121@shlump.nac.dec.com> > I'd like to know if it's possible to use neural nets to solve problems > that aren't fully deterministic, that is, similar inputs produce two or > more different outputs in different training cases. Look at Ivakhnenko and Lapa's book on Forecasting and Predicition Techniques. [I don't have the complete reference here.] Sometimes, you can model the determinsitic part and the stochastic parts separately. At other times, you might want to start from random intial behavior and bias it towards determinsitic behavior. You will most definitely need stochastic units in the network(s) you use. Pankaj Mehra University of Illinois
chrisley@csli.Stanford.EDU (Ron Chrisley) (08/02/90)
Many people in the neural net/PDP community have ignored the non-deterministic case of pattern-recognition. I've seen talks/papers that try to provide all-encompassing frameworks for pattern-recognition in nnets, and yet they assume things like "there is a 0-error weight-state". Of course, in truly non-deterministic problems, there is no such thing as a state that never makes mistakes. All one can do is maximize the likelihood of correct classification. The same goes for prediction. Yes, there are probably nets that predict the mean. Nearest neighbor classifiers will probably pick the mode (the output that was most frequently associated with the input) For example, if outputs are not predicted inches of rainfall, which is a continuous variable, but are instead small in number and discrete, such as weather types like cloudy, windy, clear, etc., then one could use a nearest-neighbor style classifier which would categorize an input to the weather class that is most likely, given the history of inputs. If you are interested in this latter type of discrete prediction, then I suggest looking at Kohonen's work on LVQ (ICNN '88) as an introduction. For the continuous case, you have to decide what kind of interpolation function makes sense, it appears. But I don't know much about this case. Anyone else? If using nnets for non-deterministic problems is "stupid" then nnets will be of limited interest in many domains, such as speech. Hope this helps. -- Ron Chrisley chrisley@csli.stanford.edu Xerox PARC SSL New College Palo Alto, CA 94304 Oxford OX1 3BN, UK (415) 494-4728 (865) 793-484
tap@ai.toronto.edu (Tony Plate) (08/03/90)
In article <6910@ptolemy.arc.nasa.gov> mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) writes: >In article <spoffojj.649522001@lgn> spoffojj@hq.af.mil (Jason Spofford) writes: >The original query was: > >> Message-ID: <14121@shlump.nac.dec.com> >> I'd like to know if it's possible to use neural nets to solve problems >> that aren't fully deterministic, that is, similar inputs produce two or >> more different outputs in different training cases. > >Look at Ivakhnenko and Lapa's book on Forecasting and Predicition >Techniques. [I don't have the complete reference here.] Sometimes, >you can model the determinsitic part and the stochastic parts >separately. At other times, you might want to start from random >intial behavior and bias it towards determinsitic behavior. >You will most definitely need stochastic units in the network(s) you >use. > > >Pankaj Mehra >University of Illinois Just a short comment on the ``most definitely'' part: It is quite possible to use deterministic nets to ``solve'' problems that aren't fully deterministic (depending upon what is meant by ``solve''.) For example, suppose we want a net to output the probability of a coin turning up heads when tossed. The network with one output unit and no inputs whatsoever will perform this task, and can be trained by gradient descent. The set of training examples can be either one example, i.e., the observed probability of turning up heads, e.g., {0.5}, or the unprocessed results of a number of trials, e.g., {1,0,0,0,1,1,0,1} In this case either the sum-of-squares or assymetric cross entropy is a suitable error function - the minimum for both occurs when the output unit gives the observed probability. However, for more complex problems, the softmax output function together with the assymetric cross entropy objective function are better in both theory and practice. John Bridle has quite a nice paper in NIPS 2 on using Nnets for stochastic problems, he shows that for a particular type of network, when the objective function is at its minimum value, the Mutual Information between the outputs of the network and the training data is at its maximum. (Btw, this gives better discrimination than Maximum Likelihood model estimation methods). Tony Plate -- ---------------- Tony Plate ---------------------- tap@ai.utoronto.ca ----- Department of Computer Science, University of Toronto, 10 Kings College Road, Toronto, Ontario, CANADA M5S 1A4 ----------------------------------------------------------------------------