[comp.ai.neural-nets] NN solution of non-deterministic problems. Doable or stupid?

hallyb@globbo.enet.dec.com (John Hallyburton) (08/01/90)

There's a lot of brainpower in this newsgroup; please forgive me if
this turns out to be a stupid question.  I don't mean to lower the
average level of discussion!

I'd like to know if it's possible to use neural nets to solve problems
that aren't fully deterministic, that is, similar inputs produce two or
more different outputs in different training cases.

One simple example is in the field of weather forecasting.  Suppose we
want to forecast next week's Duluth rainfall.  We might input various
training cases over the last 40 years, including such data as the local
rainfall last week, two weeks ago, ..., last Winter's snowfall in the
Rockies, the temperature of the Pacific Ocean at various stations, the
phase of the moon, etc., the list goes on and on.

When you boil it down to basics, so to speak, you will end up with
training cases that have repeated inputs and different outputs.  What
could I expect a good neural net program to produce?  Taking a simple
example where A, B, and C are constants and there is only one output
(rainfall total in inches), history might give us the following cases:

	A, B, C,    0
	A, B, C,    5
	A, B, C,    5
	A, B, C,    6

Would most neural nets decide the correct output for (A, B, C) is the
average, 4?

Would there be any way to construct extra outputs that serve as sort of
a confidence indicator, so that the reader could see not only a forecast
"4 inches of rain expected next week", but also get a feeling for its
accuracy, optimally yielding a forecast like "75% chance of rain with
about 5 inches expected.  Get that corn planted".

If the number of inputs is small one could code up a solution using more
traditional programming techniques.  But if there are 100 inputs then it
becomes impractical to try to look at every possible subset of the inputs
to determine what was obviously 75% in the example above.

Any thoughts appreciated.

  John (usual disclaimers)

spoffojj@hq.af.mil (Jason Spofford) (08/01/90)

I'll give you my two cents...

If you plotted the monthly rain accumulation amounts over the last 40
years for Duluth, you would end up with a simple graph of rain fall
amounts over time. This graph would have a line that went up and down
depending on the seasons. What you are asking the NN to do is to
project what is going to happen in the future, based on what happened
in the past. 

	I have not trained NN's (in the traditional approaches) so I
am a little unclear on exactly how to train the NN to perform this
function. I imagine the input to the NN would be TIME, like the number
of months. You would train the NN on the last forty years by
presenting each month, one at a time, and telling the NN what the
outputs should be (the rainfall amounts). The NN will hopefully attempt to
create a function that will tell you not only what really happened in
the past, but what will happen in the future too, by presenting a
future TIME value.

	There are so many variables in weather that the forecasting
performance of a NN trained as stated above is likely to be poor. 
I hope this info is useful.

--
----------------------------------------------------------
)       Jason Spofford <((((((> spoffojj.hq.af.mil       (
)       LAN Manager       George Mason Univ. Grad. Stud. (
----------------------------------------------------------

mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) (08/02/90)

In article <spoffojj.649522001@lgn> spoffojj@hq.af.mil (Jason Spofford) writes:
>If you plotted the monthly rain accumulation amounts over the last 40
>years for Duluth, you would end up with a simple graph of rain fall
>amounts over time. This graph would have a line that went up and down
>depending on the seasons. What you are asking the NN to do is to
>project what is going to happen in the future, based on what happened
>in the past. 
>
>	I have not trained NN's (in the traditional approaches) so I
>am a little unclear on exactly how to train the NN to perform this
>function.

David Rogers from RIACS in Mountain View, CA recently presented some
results about characterizing rainfall in N. Australia using Kanerva's
SDM. The problem he addressed looked more like a regression problem
than a prediction problem. But he reported some good results on a very
large problem. If I remember correctly, his approach resembles genetic
search.

The original query was:
> Message-ID: <14121@shlump.nac.dec.com>
> I'd like to know if it's possible to use neural nets to solve problems
> that aren't fully deterministic, that is, similar inputs produce two or
> more different outputs in different training cases.

Look at Ivakhnenko and Lapa's book on Forecasting and Predicition
Techniques. [I don't have the complete reference here.] Sometimes,
you can model the determinsitic part and the stochastic parts
separately. At other times, you might want to start from random
intial behavior and bias it towards determinsitic behavior.
You will most definitely need stochastic units in the network(s) you
use.


Pankaj Mehra
University of Illinois

chrisley@csli.Stanford.EDU (Ron Chrisley) (08/02/90)

Many people in the neural net/PDP community have ignored the non-deterministic
case of pattern-recognition.  I've seen talks/papers that try to provide
all-encompassing frameworks for pattern-recognition in nnets, and yet they
assume things like "there is a 0-error weight-state".  Of course, in truly
non-deterministic problems, there is no such thing as a state that never makes
mistakes.  All one can do is maximize the likelihood of correct
classification.  The same goes for prediction.

Yes, there are probably nets that predict the mean.  Nearest neighbor
classifiers will probably pick the mode (the output that was most frequently
associated with the input)

For example, if outputs are not predicted inches of rainfall, which is a 
continuous variable, but are instead small in number and discrete, such as
weather types like cloudy, windy, clear, etc., then one could use
a nearest-neighbor style classifier which would categorize an input to the
weather class that is most likely, given the history of inputs.

If you are interested in this latter type of discrete prediction, then I
suggest looking at Kohonen's work on LVQ (ICNN '88) as an introduction.

For the continuous case, you have to decide what kind of interpolation function
makes sense, it appears.  But I don't know much about this case.  Anyone else?

If using nnets for non-deterministic problems is "stupid" then nnets will be of
limited interest in many domains, such as speech.

Hope this helps.



-- 
Ron Chrisley    chrisley@csli.stanford.edu
Xerox PARC SSL                               New College
Palo Alto, CA 94304                          Oxford OX1 3BN, UK
(415) 494-4728                               (865) 793-484

tap@ai.toronto.edu (Tony Plate) (08/03/90)

In article <6910@ptolemy.arc.nasa.gov> mehra@ptolemy.arc.nasa.gov (Pankaj Mehra) writes:
>In article <spoffojj.649522001@lgn> spoffojj@hq.af.mil (Jason Spofford) writes:
>The original query was:
>
>> Message-ID: <14121@shlump.nac.dec.com>
>> I'd like to know if it's possible to use neural nets to solve problems
>> that aren't fully deterministic, that is, similar inputs produce two or
>> more different outputs in different training cases.
>
>Look at Ivakhnenko and Lapa's book on Forecasting and Predicition
>Techniques. [I don't have the complete reference here.] Sometimes,
>you can model the determinsitic part and the stochastic parts
>separately. At other times, you might want to start from random
>intial behavior and bias it towards determinsitic behavior.
>You will most definitely need stochastic units in the network(s) you
>use.
>
>
>Pankaj Mehra
>University of Illinois

Just a short comment on the ``most definitely'' part:

It is quite possible to use deterministic nets to ``solve'' problems
that aren't fully deterministic (depending upon what is meant by
``solve''.)  For example, suppose we want a net to output the probability
of a coin turning up heads when tossed.  The network with one output
unit and no inputs whatsoever will perform this task, and can be
trained by gradient descent.

The set of training examples can be either one example, i.e., the observed
probability of turning up heads, e.g., {0.5}, or the unprocessed results
of a number of trials, e.g., {1,0,0,0,1,1,0,1}

In this case either the sum-of-squares or assymetric cross entropy is a 
suitable error function - the minimum for both occurs when the
output unit gives the observed probability.  However, for more
complex problems, the softmax output function together with the
assymetric cross entropy objective function are better in both
theory and practice.

John Bridle has quite a nice paper in NIPS 2 on using Nnets for stochastic
problems, he shows that for a particular type of network, when the objective
function is at its minimum value, the Mutual Information between the outputs
of the network and the training data is at its maximum.  (Btw, this gives
better discrimination than Maximum Likelihood model estimation methods).

Tony Plate
-- 
---------------- Tony Plate ----------------------  tap@ai.utoronto.ca -----
Department of Computer Science, University of Toronto, 
10 Kings College Road, Toronto, 
Ontario, CANADA M5S 1A4
----------------------------------------------------------------------------